Query Languages
KnowledgeFlowDB supports two query languages, each routed to a different storage backend. Both must return the same data — if they don't, CDC sync is broken.
KQL (Knowledge Query Language)
KQL is a Cypher-inspired graph query language routed to ClickHouse for fast analytics.
Endpoint: POST /api/v1/query
curl -X POST https://api.knowledgedataflow.org/api/v1/query \
-H "X-Wallet-Address: 0xYOUR_WALLET" \
-H "Content-Type: application/json" \
-d '{"query": "MATCH (f:File)-[:DEFINES]->(fn:Function) WHERE fn.visibility = '\''public'\'' RETURN f.path, fn.name LIMIT 10"}'
KQL Syntax
-- Count nodes by label
MATCH (n:File) RETURN COUNT(n) AS total
-- Pattern matching with relationships
MATCH (f:File)-[r:IMPORTS]->(dep:File)
RETURN f.path AS source, dep.path AS dependency
-- Filtering
MATCH (n:Function)
WHERE n.lines > 100
RETURN n.name, n.lines
ORDER BY n.lines DESC
LIMIT 20
-- Aggregation
MATCH (r:Repository)-[:CONTAINS]->(f:File)
RETURN r.name, COUNT(f) AS file_count
ORDER BY file_count DESC
See KQL Syntax for the complete reference.
SQL
SQL queries are routed to ScyllaDB (the source of truth) and use CQL-compatible syntax.
Endpoint: POST /api/v1/query/sql
curl -X POST https://api.knowledgedataflow.org/api/v1/query/sql \
-H "X-Wallet-Address: 0xYOUR_WALLET" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT * FROM nodes_by_label WHERE label='\''File'\'' LIMIT 10"}'
EXPLAIN
Use the dedicated EXPLAIN endpoint to see the query plan without executing:
Endpoint: POST /api/v1/query/explain
curl -X POST https://api.knowledgedataflow.org/api/v1/query/explain \
-H "X-Wallet-Address: 0xYOUR_WALLET" \
-H "Content-Type: application/json" \
-d '{"query": "MATCH (n:File) RETURN COUNT(n)"}'
Returns the query plan, optimizer decisions, and routing information (ScyllaDB vs ClickHouse).
Dual-Storage Architecture
Write API ──> ScyllaDB (ACID source of truth)
│
│ CDC (Change Data Capture)
▼
ClickHouse (read-only analytics replica)
- ScyllaDB = Source of truth. All writes go here. ACID guarantees.
- ClickHouse = Read-only analytics replica, synced via CDC. KQL queries run here.
- Both must match — if KQL and SQL return different counts for the same data, CDC sync is broken.
Query Routing
The API automatically routes queries to the optimal backend:
| Query Type | Backend | Why |
|---|---|---|
KQL (/api/v1/query) | ClickHouse | Columnar storage optimized for analytics |
SQL (/api/v1/query/sql) | ScyllaDB | Source of truth, ACID consistency |
EXPLAIN (/api/v1/query/explain) | Neither | Returns plan only, no execution |
The query router also uses A/B experiments to optimize routing for different query patterns. Use analyze_query_routing (MCP tool) to see routing decisions.