Skip to main content

Query Languages

KnowledgeFlowDB supports two query languages, each routed to a different storage backend. Both must return the same data — if they don't, CDC sync is broken.

KQL (Knowledge Query Language)

KQL is a Cypher-inspired graph query language routed to ClickHouse for fast analytics.

Endpoint: POST /api/v1/query

curl -X POST https://api.knowledgedataflow.org/api/v1/query \
-H "X-Wallet-Address: 0xYOUR_WALLET" \
-H "Content-Type: application/json" \
-d '{"query": "MATCH (f:File)-[:DEFINES]->(fn:Function) WHERE fn.visibility = '\''public'\'' RETURN f.path, fn.name LIMIT 10"}'

KQL Syntax

-- Count nodes by label
MATCH (n:File) RETURN COUNT(n) AS total

-- Pattern matching with relationships
MATCH (f:File)-[r:IMPORTS]->(dep:File)
RETURN f.path AS source, dep.path AS dependency

-- Filtering
MATCH (n:Function)
WHERE n.lines > 100
RETURN n.name, n.lines
ORDER BY n.lines DESC
LIMIT 20

-- Aggregation
MATCH (r:Repository)-[:CONTAINS]->(f:File)
RETURN r.name, COUNT(f) AS file_count
ORDER BY file_count DESC

See KQL Syntax for the complete reference.

SQL

SQL queries are routed to ScyllaDB (the source of truth) and use CQL-compatible syntax.

Endpoint: POST /api/v1/query/sql

curl -X POST https://api.knowledgedataflow.org/api/v1/query/sql \
-H "X-Wallet-Address: 0xYOUR_WALLET" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT * FROM nodes_by_label WHERE label='\''File'\'' LIMIT 10"}'

EXPLAIN

Use the dedicated EXPLAIN endpoint to see the query plan without executing:

Endpoint: POST /api/v1/query/explain

curl -X POST https://api.knowledgedataflow.org/api/v1/query/explain \
-H "X-Wallet-Address: 0xYOUR_WALLET" \
-H "Content-Type: application/json" \
-d '{"query": "MATCH (n:File) RETURN COUNT(n)"}'

Returns the query plan, optimizer decisions, and routing information (ScyllaDB vs ClickHouse).

Dual-Storage Architecture

Write API ──> ScyllaDB (ACID source of truth)

│ CDC (Change Data Capture)

ClickHouse (read-only analytics replica)
  • ScyllaDB = Source of truth. All writes go here. ACID guarantees.
  • ClickHouse = Read-only analytics replica, synced via CDC. KQL queries run here.
  • Both must match — if KQL and SQL return different counts for the same data, CDC sync is broken.

Query Routing

The API automatically routes queries to the optimal backend:

Query TypeBackendWhy
KQL (/api/v1/query)ClickHouseColumnar storage optimized for analytics
SQL (/api/v1/query/sql)ScyllaDBSource of truth, ACID consistency
EXPLAIN (/api/v1/query/explain)NeitherReturns plan only, no execution

The query router also uses A/B experiments to optimize routing for different query patterns. Use analyze_query_routing (MCP tool) to see routing decisions.