Encryption Model

KnowledgeFlowDB implements property-level encryption using AES-256-GCM, ensuring that tenant data stored in ScyllaDB and ClickHouse is cryptographically protected at rest. Even with direct database access, an attacker sees only ciphertext.

Overview

Algorithm: AES-256-GCM (authenticated encryption with associated data)
Key Derivation: HKDF-SHA256 with domain-separated info strings
Granularity: Each property value is encrypted independently
Non-deterministic: Random 12-byte nonce per encryption operation (no ciphertext correlation)
Type preservation: Original value types are restored on decryption via type tags

Wire Format

Every encrypted property value is stored as a string with the following structure:

__enc_{type}:{base64(nonce || ciphertext || tag)}

Component	Size	Description
`__enc_`	6 bytes	Fixed prefix identifying encrypted values
`{type}`	variable	Type tag for restoring the original value type
`:`	1 byte	Separator
`nonce`	12 bytes	Randomly generated per encryption (AES-GCM IV)
`ciphertext`	variable	Encrypted payload
`tag`	16 bytes	AES-GCM authentication tag (integrity proof)

Type Tags

Tag	Original Type	Serialization
`str`	String	UTF-8 bytes
`int`	Integer (i64)	Little-endian 8 bytes
`float`	Float (f64)	Little-endian 8 bytes
`bool`	Boolean	Single byte (0 or 1)
`vec`	Vector (f32[])	Concatenated little-endian 4-byte floats
`arr`	Array	JSON serialization
`obj`	Object	JSON serialization
`null`	Null	Empty (0 bytes)

Example

A string property "main.rs" encrypted with a tenant key might be stored as:

__enc_str:DMfK8x2Qa1bN+7hGcmVzdCBvZiB0aGUgZW5jcnlwdGVkIGRhdGE=

The same value encrypted again produces a different ciphertext because each encryption uses a fresh random nonce. This prevents frequency analysis attacks.

Key Hierarchy

KnowledgeFlowDB uses a hierarchical key derivation scheme based on HKDF-SHA256 to ensure cryptographic domain separation between subsystems.

Root Key (32 bytes, from K8s Secret or wallet signature)
  |
  +-- HKDF(root, "kfdb-graph-key-v1")        --> Graph Key
  |                                                (node/edge properties)
  |
  +-- HKDF(root, "kfdb-vector-key-v1")        --> Vector Key
  |                                                (embedding vectors)
  |
  +-- HKDF(root, "kfdb-fts-key-v1")           --> FTS Key
  |                                                (full-text search tokens)
  |
  +-- HKDF(root, "kfdb-property-key-v1:{name}") --> Per-Property Key
                                                    (selective disclosure)

Domain Separation

Each derived key is cryptographically independent. Compromising one subsystem key does not leak information about others:

Graph Key encrypts node and edge properties stored in ScyllaDB
Vector Key encrypts embedding vectors used for semantic search
FTS Key encrypts full-text search index tokens
Per-Property Keys enable selective disclosure (e.g., reveal name without revealing salary)

Key Zeroization

All key material is zeroized on drop. When a KeyHierarchy struct goes out of scope, root, graph, vector, and FTS keys are overwritten with zeros as a defense-in-depth measure.

Root Key Sources

1. Kubernetes Secret (Server-Managed)

The master encryption key is stored as a Kubernetes secret and injected via the ENCRYPTION_MASTER_KEY environment variable. Per-tenant keys are then derived using HKDF:

HKDF-SHA256(
  ikm:  ENCRYPTION_MASTER_KEY,
  salt: ENCRYPTION_SALT + wallet_address,
  info: "kfdb-tenant-encryption-v1"
) --> 32-byte per-wallet key

2. Sign-to-Derive (Wallet-Based)

For wallet-authenticated tenants, the root key can be derived from an ECDSA signature without storing any secrets server-side:

SHA-256(r[32] || s[32] || v[1]) --> Root Key

The user signs a deterministic challenge message with their wallet. The resulting signature components (r, s, recovery id) are hashed to produce the root key. This means:

No key storage required on the server
The user's wallet is the key material
Same wallet + same challenge = same root key (deterministic)
Different wallets = different root keys (cryptographic isolation)

Data Flow

  Write Request (plaintext properties)
       |
       v
  +-----------------------+
  | Tenant Middleware      |
  | 1. Authenticate       |
  | 2. Derive wallet key  |
  |    (HKDF from master) |
  +-----------------------+
       |
       v
  +-----------------------+
  | Encrypt Properties    |
  | For each property:    |
  |   AES-256-GCM encrypt |
  |   Random 12-byte nonce|
  |   Store as __enc_*    |
  +-----------------------+
       |
       +-------> ScyllaDB (stores __enc_* ciphertext)
       |
       +-------> ClickHouse CDC (stores __enc_* ciphertext)

  Read Request
       |
       v
  +-----------------------+
  | Tenant Middleware      |
  | 1. Authenticate       |
  | 2. Derive wallet key  |
  +-----------------------+
       |
       v
  +-----------------------+
  | Decrypt Properties    |
  | For each property:    |
  |   Detect __enc_ prefix|
  |   Extract type tag    |
  |   AES-256-GCM decrypt |
  |   Restore typed value |
  +-----------------------+
       |
       v
  API Response (plaintext properties)

What Is and Is Not Encrypted

Data	Encrypted	Reason
Property values	Yes	Contains sensitive tenant data
Property keys (names)	No	Required for query filtering and schema
Node/Edge labels	No	Required for graph traversal and indexing
Node/Edge IDs	No	Required for referential integrity
Graph structure (edges)	No	Required for traversal operations
Embedding vectors	Yes (with Vector Key)	Contains semantic information about content

Query Implications

When encryption is active for a tenant:

ClickHouse pushdown is disabled -- queries cannot filter on encrypted property values in ClickHouse since it only sees ciphertext
Graph traversal works normally -- labels, IDs, and edge structure are plaintext
Property filtering happens after decryption in the API layer
Full-text search uses the dedicated FTS key for encrypted index tokens

Activation

Encryption is only active when KFDB_TEE_MODE=encrypted is set. Standard pods do not encrypt data, preventing accidental encryption of public datasets.

Security Properties

Property	Guarantee
Confidentiality	AES-256-GCM with random nonce per value
Integrity	16-byte GCM authentication tag per value
Key isolation	HKDF domain separation per subsystem
Tenant isolation	Per-wallet derived keys (different wallets = different keys)
Forward secrecy	Key zeroization on struct drop
Non-correlation	Random nonce prevents ciphertext comparison

Source Code

Encryption: crates/kfdb-graph/src/encryption.rs
Key Hierarchy: crates/kfdb-graph/src/key_hierarchy.rs
Tenant Middleware: crates/kfdb-api/src/tenant/middleware.rs

Overview​

Wire Format​

Type Tags​

Example​

Key Hierarchy​

Domain Separation​

Key Zeroization​

Root Key Sources​

1. Kubernetes Secret (Server-Managed)​

2. Sign-to-Derive (Wallet-Based)​

Data Flow​

What Is and Is Not Encrypted​

Query Implications​

Security Properties​

Source Code​