Choosing a query shape
OriginChain answers five kinds of question - SQL, vector, full-text, graph, and natural language - against the same data. They commit atomically, but they're not interchangeable. Each is built for a specific shape of question.
This page is the decision guide. If your query looks like X, reach for Y. The capability matrix at the bottom is engine-accurate as of today - it shows what actually works, not what we plan to ship.
The shapes, side by side.
POST /v1/tenants/:t/sql Filtering by exact-match WHERE on indexed columns. JOINs across up to 32 tables. GROUP BY with COUNT / SUM / AVG / MIN / MAX. LIMIT-bounded reads.
Semantic similarity (use vector). Approximate string matching (use full-text). Multi-hop relationship questions (use graph). Natural-language questions from non-engineers (use Ask).
A Postgres-shaped query you'd write in DBeaver.
POST /v1/tenants/:t/vector/:table/topk Semantic similarity. Cross-language retrieval. 'Find rows like this one' even when the keywords don't match. Recommendations. De-duplication by meaning. RAG retrieval before an LLM call.
Exact identifiers, SKUs, error codes (use SQL or full-text). Structural traversal (use graph).
An embedding vector plus k.
GET /v1/tenants/:t/fts/:table/:field Exact phrase matching. Acronyms and product codes. Long-tail queries with unusual terms. Recall on documents containing the literal keyword.
Conceptual queries where the user's wording doesn't match the document's wording (vector wins). Structured-field filtering (SQL is cheaper).
Words a human types into a search box.
GET/POST /v1/tenants/:t/graph/:schema/:algo Multi-hop relationship questions ('orders from customers I haven't reviewed yet'). Social-graph walks. Dependency chains. Shortest path. PageRank or centrality analytics. Reachability checks.
Single-table lookups (use SQL). Semantic similarity (use vector). Large-result analytics that aren't relationship-shaped (SQL is cheaper).
Multi-hop or path query: 'shortest path through citations'.
Run vector topk + FTS in parallel, fuse client-side Production retrieval. Catches both semantic match and literal keyword match. Generally outperforms either alone on standard benchmarks.
Anything where one mode is structurally enough - don't fuse if vector alone is already perfect.
RAG retrieval before the LLM call.
POST /v1/tenants/:t/ask Non-technical users asking questions of structured data. Internal dashboards. Customer-support agents. Prototype-grade analytics without writing SQL.
Latency-critical hot paths (a cold compile costs an LLM round-trip). Queries that need to be auditable to a single SQL string.
An English sentence.
If your query looks like this...
Pattern-match the left column against what you're trying to do; the middle and right columns are the answer.
WHERE id = 'sku-9281' SQL Exact lookup on an indexed primary key. WHERE status = 'pending' AND amount_cents > 100 SQL Multi-predicate WHERE with AND on indexed columns. Per-customer totals over paid orders SQL GROUP BY customer + SUM(amount_cents). Products similar to this one (no shared keywords) Vector Semantic similarity via embedding distance. Products described as 'lightweight running shoes for marathons' Hybrid (vector + BM25) Catches the semantic match AND the literal keyword. Find SKU ABC-1234-XL Full-text or SQL Exact-token retrieval. Vector would dilute it semantically. Path between paper A and paper Z through citations Graph (BFS / path) Multi-hop walk. Cap with max_depth. Shortest commute between two stations Graph (Dijkstra) Weighted shortest path. Supply edge weights via the JSON weights map. Most influential nodes in a network Graph (PageRank) Iterative influence over a seed node set. Customers in segment X (English question) Ask Translates the sentence to a Plan against your schemas. Capability matrix.
What's supported today. yes = works · partial = limited shape (see the relevant reference page) · — = not the right tool for this shape, or not yet supported.
"Ask" inherits SQL's surface where the compiler can build the right Plan - if SQL doesn't support a construct (like HAVING or window functions), Ask can't either.
| Feature | SQL | Vector | FTS | Graph | Ask |
|---|---|---|---|---|---|
| Exact-match WHERE on indexed col | yes | — | yes | — | yes |
| AND-combined WHERE conditions | yes | — | — | — | yes |
| OR in WHERE | — | — | — | — | partial |
| IN (literal list) | yes | — | — | — | yes |
| BETWEEN, IS NULL, LIKE | yes | — | — | — | yes |
| GROUP BY + COUNT / SUM / AVG / MIN / MAX | yes | — | — | — | yes |
| HAVING | — | — | — | — | partial |
| ORDER BY | — | yes | yes | — | partial |
| INNER / LEFT / RIGHT / FULL OUTER JOIN | yes | — | — | — | yes |
| LIMIT | yes | yes | yes | yes | yes |
| Uncorrelated IN (SELECT ...) | partial | — | — | — | partial |
| Correlated subqueries / EXISTS | — | — | — | — | — |
| CTEs (WITH) | — | — | — | — | — |
| Window functions | — | — | — | — | — |
| EXPLAIN | yes | — | — | — | yes |
| Transactions (BEGIN/COMMIT/ROLLBACK) | yes | — | — | — | — |
| HNSW · cosine / dot / L2 / Manhattan | — | yes | — | — | — |
| IVF / IVF-PQ for 10M+ corpora | — | yes | — | — | — |
| Metadata equality filter on topk | — | yes | — | — | — |
| fast / high_recall mode selector | — | yes | — | — | — |
| BM25 ranking | — | — | yes | — | — |
| Boolean AND | — | — | yes | — | — |
| Phrase (exact word order) | — | — | yes | — | — |
| Fuzzy / typo tolerance | — | — | yes | — | — |
| 18-language stemming · 9-language lemmas | — | — | yes | — | — |
| Neighbours (forward + reverse) | — | — | — | yes | — |
| BFS, path, all simple paths | — | — | — | yes | — |
| Dijkstra / k-shortest weighted paths | — | — | — | yes | — |
| PageRank, betweenness, eigenvector | — | — | — | yes | — |
| Louvain, label-prop, components | — | — | — | yes | — |
| Node2Vec / GraphSAGE embeddings | — | — | — | yes | — |
| Atomic write (row + indexes + edges) | yes | yes | yes | yes | — |
| Idempotency keys | yes | yes | yes | yes | yes |
Combining shapes in one app.
The shapes aren't mutually exclusive. A single bearer token can hit any of them, and the engine keeps row writes + vector + full-text indexes consistent. Three common patterns:
- RAG retrieval before an LLM call. Run vector + full-text in parallel, fuse with Reciprocal Rank Fusion in your app, pass the top-k rows into your prompt. The rows you retrieve are from the same store as your application data, so authorization is consistent.
- Graph filter + SQL projection. Use a multi-hop traversal to find candidate primary keys, then do a SQL
WHERE id IN (...)for the full projection. Graph hop costs ~tens of ms; SQL projection is sub-millisecond. - Ask with show_plan. Let a non-technical user write the question, return the compiled plan, paste the equivalent SQL into your codebase. Ask becomes the prototype; the SQL becomes the production query.