Skip to main content

Encryption Model

KnowledgeFlowDB implements property-level encryption using AES-256-GCM, ensuring that tenant data stored in ScyllaDB and ClickHouse is cryptographically protected at rest. Even with direct database access, an attacker sees only ciphertext.

Overview

  • Algorithm: AES-256-GCM (authenticated encryption with associated data)
  • Key Derivation: HKDF-SHA256 with domain-separated info strings
  • Granularity: Each property value is encrypted independently
  • Non-deterministic: Random 12-byte nonce per encryption operation (no ciphertext correlation)
  • Type preservation: Original value types are restored on decryption via type tags

Wire Format

Every encrypted property value is stored as a string with the following structure:

__enc_{type}:{base64(nonce || ciphertext || tag)}
ComponentSizeDescription
__enc_6 bytesFixed prefix identifying encrypted values
{type}variableType tag for restoring the original value type
:1 byteSeparator
nonce12 bytesRandomly generated per encryption (AES-GCM IV)
ciphertextvariableEncrypted payload
tag16 bytesAES-GCM authentication tag (integrity proof)

Type Tags

TagOriginal TypeSerialization
strStringUTF-8 bytes
intInteger (i64)Little-endian 8 bytes
floatFloat (f64)Little-endian 8 bytes
boolBooleanSingle byte (0 or 1)
vecVector (f32[])Concatenated little-endian 4-byte floats
arrArrayJSON serialization
objObjectJSON serialization
nullNullEmpty (0 bytes)

Example

A string property "main.rs" encrypted with a tenant key might be stored as:

__enc_str:DMfK8x2Qa1bN+7hGcmVzdCBvZiB0aGUgZW5jcnlwdGVkIGRhdGE=

The same value encrypted again produces a different ciphertext because each encryption uses a fresh random nonce. This prevents frequency analysis attacks.

Key Hierarchy

KnowledgeFlowDB uses a hierarchical key derivation scheme based on HKDF-SHA256 to ensure cryptographic domain separation between subsystems.

Root Key (32 bytes, from K8s Secret or wallet signature)
|
+-- HKDF(root, "kfdb-graph-key-v1") --> Graph Key
| (node/edge properties)
|
+-- HKDF(root, "kfdb-vector-key-v1") --> Vector Key
| (embedding vectors)
|
+-- HKDF(root, "kfdb-fts-key-v1") --> FTS Key
| (full-text search tokens)
|
+-- HKDF(root, "kfdb-property-key-v1:{name}") --> Per-Property Key
(selective disclosure)

Domain Separation

Each derived key is cryptographically independent. Compromising one subsystem key does not leak information about others:

  • Graph Key encrypts node and edge properties stored in ScyllaDB
  • Vector Key encrypts embedding vectors used for semantic search
  • FTS Key encrypts full-text search index tokens
  • Per-Property Keys enable selective disclosure (e.g., reveal name without revealing salary)

Key Zeroization

All key material is zeroized on drop. When a KeyHierarchy struct goes out of scope, root, graph, vector, and FTS keys are overwritten with zeros as a defense-in-depth measure.

Root Key Sources

1. Kubernetes Secret (Server-Managed)

The master encryption key is stored as a Kubernetes secret and injected via the ENCRYPTION_MASTER_KEY environment variable. Per-tenant keys are then derived using HKDF:

HKDF-SHA256(
ikm: ENCRYPTION_MASTER_KEY,
salt: ENCRYPTION_SALT + wallet_address,
info: "kfdb-tenant-encryption-v1"
) --> 32-byte per-wallet key

2. Sign-to-Derive (Wallet-Based)

For wallet-authenticated tenants, the root key can be derived from an ECDSA signature without storing any secrets server-side:

SHA-256(r[32] || s[32] || v[1]) --> Root Key

The user signs a deterministic challenge message with their wallet. The resulting signature components (r, s, recovery id) are hashed to produce the root key. This means:

  • No key storage required on the server
  • The user's wallet is the key material
  • Same wallet + same challenge = same root key (deterministic)
  • Different wallets = different root keys (cryptographic isolation)

Data Flow

  Write Request (plaintext properties)
|
v
+-----------------------+
| Tenant Middleware |
| 1. Authenticate |
| 2. Derive wallet key |
| (HKDF from master) |
+-----------------------+
|
v
+-----------------------+
| Encrypt Properties |
| For each property: |
| AES-256-GCM encrypt |
| Random 12-byte nonce|
| Store as __enc_* |
+-----------------------+
|
+-------> ScyllaDB (stores __enc_* ciphertext)
|
+-------> ClickHouse CDC (stores __enc_* ciphertext)

Read Request
|
v
+-----------------------+
| Tenant Middleware |
| 1. Authenticate |
| 2. Derive wallet key |
+-----------------------+
|
v
+-----------------------+
| Decrypt Properties |
| For each property: |
| Detect __enc_ prefix|
| Extract type tag |
| AES-256-GCM decrypt |
| Restore typed value |
+-----------------------+
|
v
API Response (plaintext properties)

What Is and Is Not Encrypted

DataEncryptedReason
Property valuesYesContains sensitive tenant data
Property keys (names)NoRequired for query filtering and schema
Node/Edge labelsNoRequired for graph traversal and indexing
Node/Edge IDsNoRequired for referential integrity
Graph structure (edges)NoRequired for traversal operations
Embedding vectorsYes (with Vector Key)Contains semantic information about content

Query Implications

When encryption is active for a tenant:

  • ClickHouse pushdown is disabled -- queries cannot filter on encrypted property values in ClickHouse since it only sees ciphertext
  • Graph traversal works normally -- labels, IDs, and edge structure are plaintext
  • Property filtering happens after decryption in the API layer
  • Full-text search uses the dedicated FTS key for encrypted index tokens
Activation

Encryption is only active when KFDB_TEE_MODE=encrypted is set. Standard pods do not encrypt data, preventing accidental encryption of public datasets.

Security Properties

PropertyGuarantee
ConfidentialityAES-256-GCM with random nonce per value
Integrity16-byte GCM authentication tag per value
Key isolationHKDF domain separation per subsystem
Tenant isolationPer-wallet derived keys (different wallets = different keys)
Forward secrecyKey zeroization on struct drop
Non-correlationRandom nonce prevents ciphertext comparison

Source Code