Encryption Model
KnowledgeFlowDB implements property-level encryption using AES-256-GCM, ensuring that tenant data stored in ScyllaDB and ClickHouse is cryptographically protected at rest. Even with direct database access, an attacker sees only ciphertext.
Overview
- Algorithm: AES-256-GCM (authenticated encryption with associated data)
- Key Derivation: HKDF-SHA256 with domain-separated info strings
- Granularity: Each property value is encrypted independently
- Non-deterministic: Random 12-byte nonce per encryption operation (no ciphertext correlation)
- Type preservation: Original value types are restored on decryption via type tags
Wire Format
Every encrypted property value is stored as a string with the following structure:
__enc_{type}:{base64(nonce || ciphertext || tag)}
| Component | Size | Description |
|---|---|---|
__enc_ | 6 bytes | Fixed prefix identifying encrypted values |
{type} | variable | Type tag for restoring the original value type |
: | 1 byte | Separator |
nonce | 12 bytes | Randomly generated per encryption (AES-GCM IV) |
ciphertext | variable | Encrypted payload |
tag | 16 bytes | AES-GCM authentication tag (integrity proof) |
Type Tags
| Tag | Original Type | Serialization |
|---|---|---|
str | String | UTF-8 bytes |
int | Integer (i64) | Little-endian 8 bytes |
float | Float (f64) | Little-endian 8 bytes |
bool | Boolean | Single byte (0 or 1) |
vec | Vector (f32[]) | Concatenated little-endian 4-byte floats |
arr | Array | JSON serialization |
obj | Object | JSON serialization |
null | Null | Empty (0 bytes) |
Example
A string property "main.rs" encrypted with a tenant key might be stored as:
__enc_str:DMfK8x2Qa1bN+7hGcmVzdCBvZiB0aGUgZW5jcnlwdGVkIGRhdGE=
The same value encrypted again produces a different ciphertext because each encryption uses a fresh random nonce. This prevents frequency analysis attacks.
Key Hierarchy
KnowledgeFlowDB uses a hierarchical key derivation scheme based on HKDF-SHA256 to ensure cryptographic domain separation between subsystems.
Root Key (32 bytes, from K8s Secret or wallet signature)
|
+-- HKDF(root, "kfdb-graph-key-v1") --> Graph Key
| (node/edge properties)
|
+-- HKDF(root, "kfdb-vector-key-v1") --> Vector Key
| (embedding vectors)
|
+-- HKDF(root, "kfdb-fts-key-v1") --> FTS Key
| (full-text search tokens)
|
+-- HKDF(root, "kfdb-property-key-v1:{name}") --> Per-Property Key
(selective disclosure)
Domain Separation
Each derived key is cryptographically independent. Compromising one subsystem key does not leak information about others:
- Graph Key encrypts node and edge properties stored in ScyllaDB
- Vector Key encrypts embedding vectors used for semantic search
- FTS Key encrypts full-text search index tokens
- Per-Property Keys enable selective disclosure (e.g., reveal
namewithout revealingsalary)
Key Zeroization
All key material is zeroized on drop. When a KeyHierarchy struct goes out of scope, root, graph, vector, and FTS keys are overwritten with zeros as a defense-in-depth measure.
Root Key Sources
1. Kubernetes Secret (Server-Managed)
The master encryption key is stored as a Kubernetes secret and injected via the ENCRYPTION_MASTER_KEY environment variable. Per-tenant keys are then derived using HKDF:
HKDF-SHA256(
ikm: ENCRYPTION_MASTER_KEY,
salt: ENCRYPTION_SALT + wallet_address,
info: "kfdb-tenant-encryption-v1"
) --> 32-byte per-wallet key
2. Sign-to-Derive (Wallet-Based)
For wallet-authenticated tenants, the root key can be derived from an ECDSA signature without storing any secrets server-side:
SHA-256(r[32] || s[32] || v[1]) --> Root Key
The user signs a deterministic challenge message with their wallet. The resulting signature components (r, s, recovery id) are hashed to produce the root key. This means:
- No key storage required on the server
- The user's wallet is the key material
- Same wallet + same challenge = same root key (deterministic)
- Different wallets = different root keys (cryptographic isolation)
Data Flow
Write Request (plaintext properties)
|
v
+-----------------------+
| Tenant Middleware |
| 1. Authenticate |
| 2. Derive wallet key |
| (HKDF from master) |
+-----------------------+
|
v
+-----------------------+
| Encrypt Properties |
| For each property: |
| AES-256-GCM encrypt |
| Random 12-byte nonce|
| Store as __enc_* |
+-----------------------+
|
+-------> ScyllaDB (stores __enc_* ciphertext)
|
+-------> ClickHouse CDC (stores __enc_* ciphertext)
Read Request
|
v
+-----------------------+
| Tenant Middleware |
| 1. Authenticate |
| 2. Derive wallet key |
+-----------------------+
|
v
+-----------------------+
| Decrypt Properties |
| For each property: |
| Detect __enc_ prefix|
| Extract type tag |
| AES-256-GCM decrypt |
| Restore typed value |
+-----------------------+
|
v
API Response (plaintext properties)
What Is and Is Not Encrypted
| Data | Encrypted | Reason |
|---|---|---|
| Property values | Yes | Contains sensitive tenant data |
| Property keys (names) | No | Required for query filtering and schema |
| Node/Edge labels | No | Required for graph traversal and indexing |
| Node/Edge IDs | No | Required for referential integrity |
| Graph structure (edges) | No | Required for traversal operations |
| Embedding vectors | Yes (with Vector Key) | Contains semantic information about content |
Query Implications
When encryption is active for a tenant:
- ClickHouse pushdown is disabled -- queries cannot filter on encrypted property values in ClickHouse since it only sees ciphertext
- Graph traversal works normally -- labels, IDs, and edge structure are plaintext
- Property filtering happens after decryption in the API layer
- Full-text search uses the dedicated FTS key for encrypted index tokens
Encryption is only active when KFDB_TEE_MODE=encrypted is set. Standard pods do not encrypt data, preventing accidental encryption of public datasets.
Security Properties
| Property | Guarantee |
|---|---|
| Confidentiality | AES-256-GCM with random nonce per value |
| Integrity | 16-byte GCM authentication tag per value |
| Key isolation | HKDF domain separation per subsystem |
| Tenant isolation | Per-wallet derived keys (different wallets = different keys) |
| Forward secrecy | Key zeroization on struct drop |
| Non-correlation | Random nonce prevents ciphertext comparison |
Source Code
- Encryption:
crates/kfdb-graph/src/encryption.rs - Key Hierarchy:
crates/kfdb-graph/src/key_hierarchy.rs - Tenant Middleware:
crates/kfdb-api/src/tenant/middleware.rs