OriginChain
features

Every capability, one page.

Five query shapes on a single managed database. The tables below list what the engine actually does today - one row per capability, one sentence on what it gives you.

01 · sql

SQL.

Relational queries against the same row store the vector, full-text and graph shapes share - one consistency model, one bearer token, one writer.

Feature What it does
SELECT / INSERT / UPDATE / DELETE The standard four. UPDATE and DELETE address rows by primary key or by a WHERE that resolves to a key range.
GROUP BY + HAVING + aggregates SUM, AVG, MIN, MAX, COUNT, COUNT DISTINCT. HAVING filters post-aggregation. Group keys can be expressions, not just bare columns.
INNER / LEFT / RIGHT / FULL OUTER JOIN All four canonical join types with arbitrary boolean ON clauses. The planner picks hash, merge, or nested-loop based on key cardinality.
JOIN USING / NATURAL JOIN Shorthand joins when both sides share a column name. Useful when junction tables follow a consistent key convention.
CROSS JOIN Cartesian product for generated-data scenarios and small lookup-cross-product reports.
Joins up to 5 tables Single-statement joins across up to five tables. Beyond five, break the query into intermediate views or pipeline the result through /ask.
Correlated subqueries EXISTS, scalar subqueries in SELECT, and IN (subquery) in WHERE. The outer-row binding is resolved without materializing the inner side.
Window functions ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, SUM OVER, AVG OVER, with PARTITION BY, ORDER BY, and ROWS BETWEEN frames.
Recursive CTEs WITH RECURSIVE for hierarchical walks (org charts, comment trees, dependency closures) that don't need the full Cypher surface.
JSON path operators -> and ->> read a field from a JSON column. Indexable when the path is fixed; otherwise scanned.
EXPLAIN ANALYZE Full plan tree with row-count estimates, actual rows, per-node time, and the index choice the planner made. Use ?explain=true on any /query call.
MVCC transactions (snapshot isolation) Begin a transaction with /tx/begin, stage multi-statement writes, then commit or rollback atomically. Snapshot-isolation today; serializable is on the roadmap.
Atomic cross-shape writes A single INSERT can stage a row, its embedding, its full-text shards, and its outbound edges in one transaction. No two-phase glue in the app.
02 · cypher / graph query

Cypher / graph query.

A read-write subset of the Cypher grammar against edges and labelled nodes stored on the same substrate as rows. No second cluster to provision.

Feature What it does
MATCH + WHERE Pattern-match nodes and edges, then filter on properties. The standard entry point for every graph query.
CREATE / SET / DELETE / MERGE Add nodes and edges, update properties, remove them, or upsert with MERGE. Writes commit atomically with the row store.
OPTIONAL MATCH Left-outer-join semantics for graph patterns. The match either binds or returns NULL for the optional part.
WITH Pipeline operator. Project intermediate results, aggregate, then pass into the next match - the way you chain stages in a Cypher query.
Single-hop, multi-hop, undirected Patterns like (a)-[:R]->(b), longer chains, and undirected (a)-[:R]-(b) for symmetric relationships.
Backward hop <-[:R]- hops both as a standalone match and inside chains. The inverse edge index makes 'who points at me' as cheap as forward expansion.
Variable-length paths (a)-[:R*1..5]->(b) for paths between one and five hops. Bounded so the planner can pick BFS depth-limit instead of unbounded expansion.
length(p) / nodes(p) / relationships(p) Built-in functions for path inspection. length(p) returns hop count; nodes(p) and relationships(p) return the path's vertex and edge lists.
UNWIND Turn a list into rows. Works on literals, the output of nodes(p) / relationships(p), and mixed lists from prior WITH clauses.
Parameters ($param) Bind query inputs by name in WHERE, RETURN, and UNWIND. Lets you cache the compiled plan across calls with different bound values.
List comprehensions [x IN coll WHERE pred | expr] over literal lists at compile time and over column-path lists at runtime - the standard Cypher fold.
FOREACH (nested up to 3 levels) Iterate a list and run a write block per element. Nested FOREACH unrolls to a flat write sequence so each element commits atomically.
CALL { subquery } Inline a read-only subquery with CALL { ... } and consume its results via YIELD. Compose larger queries without breaking the single-statement boundary.
shortestPath Shortest path between two nodes by hop count. For weighted shortest path, see the Graph algorithms section.
ORDER BY / SKIP / LIMIT / DISTINCT Sort and paginate Cypher RETURN clauses. DISTINCT deduplicates result rows the same way it does in SQL.
03 · graph algorithms

Graph algorithms.

First-class algorithms exposed at the engine level - call them directly through the API without lifting the data into a separate analytics layer.

Feature What it does
Neighbors / BFS / DFS Forward expansion from a starting node. Pluggable depth cap; results stream as the frontier expands.
Reverse traversal Same expansion against the inverse edge index, so "who points at me" queries are O(degree), not full scan.
Shortest path (unweighted) BFS-based shortest path between two nodes. Returns the path itself, not just the length.
Weighted shortest path (Dijkstra) Dijkstra on a numeric edge property. Useful for cost, latency, or distance routing.
K-shortest paths (Yen's algorithm) Top-K diverse shortest paths between two nodes. Use when one route isn't enough - rerouting, alternative-recommendation, lineage trees.
All simple paths Enumerate every loop-free path between two nodes up to a hop bound. Bounded so the call terminates on dense graphs.
Connected components Partition the graph into reachability classes. Returns a component id per node.
Triangle counting Count triangles globally or per node. The local count is a clustering-coefficient input.
Degree centrality Rank nodes by their in-degree, out-degree, or total degree.
Betweenness centrality Brandes' algorithm. Identifies the bridge nodes whose removal would disconnect or fragment the graph.
Eigenvector centrality Iterative eigenvector solve. Captures recursive importance - the well-connected-to-the-well-connected signal.
PageRank Damped random-walk centrality with tunable damping factor and convergence tolerance. The standard ranking primitive.
Label propagation Community detection by majority-label propagation. Converges quickly on most real graphs.
Louvain community detection Modularity-optimisation community detection. Finds tighter communities than label propagation on graphs with clearer hierarchical structure.
Random walk + biased Node2Vec walks Uniform random walks and Node2Vec p/q-biased walks. The walk stream feeds either downstream analytics or the embedding trainer.
Node2Vec embeddings Skip-gram with negative sampling over Node2Vec walks. Persisted on the engine; topk_node2vec serves 'find similar nodes' without round-tripping through Python.
04 · vector search

Vector search.

Dense and sparse vectors stored next to the row they describe. The vector index is part of the same instance - no separate service to sync.

Feature What it does
Dense vectors + top-K Submit an embedding and receive the top-K nearest documents by the configured distance metric.
HNSW index Hierarchical Navigable Small World graph. Default index for any table over a few thousand vectors. Tunable ef_search per query.
Brute-force fallback For small tables or when you want exact recall, top-K falls back to a linear scan. No code change - the planner picks it.
Filtered top-K (metadata) Restrict the candidate set by an equality predicate on a metadata field before ranking. Stays fast for selective filters.
Pre-filter HNSW walker Filter inside the HNSW walk instead of post-filtering the top-K. Selective filters never make you over-fetch then discard hits.
Sparse vectors BM25-style sparse vectors for lexical retrieval. Same put/topk endpoints; the metric is dot-product.
Sparse IVF Inverted-file index over sparse vectors. Scales sparse retrieval past the brute-force cliff without giving up the BM25-style scoring shape.
Hybrid sparse + dense (RRF) Reciprocal rank fusion of a sparse and a dense query. One call, one merged ranked list - no client-side stitching.
IVF (inverted file index) Voronoi-cell partitioning with k-means-trained centroids. Search visits the top-nprobe cells - the standard scale path past the HNSW comfort zone.
IVF-PQ (IVF + product quantization) IVF partitioning layered on a PQ-compressed posting list. Roughly 32x less memory per resident vector at the same recall - the path to ten-million-plus tables.
Scalar quantization (int8) Per-dimension int8 quantization. About 4x memory reduction with under 2% recall loss on typical embeddings.
Binary quantization (1-bit) 1-bit-per-dimension quantization with asymmetric scoring. About 32x memory reduction; recall stays at or above 0.80 on structured corpora.
Distance metrics: L2, cosine, dot, Manhattan Four metrics, picked per table at the first put. The metric is locked once chosen so every query is comparable.
SIMD-accelerated kernels (f32) AVX2 / NEON float32 kernels for distance computation. Owned in-tree - no math library wrapper to upgrade.
DELETE vector Removes a vector by id. HNSW tombstones the node; IVF / IVF-PQ evict it from its posting list. Put-after-delete cleanly resurrects the slot.
Per-tenant isolation Each tenant runs on its own instance. Your vectors share no memory, no cache, and no scheduler with anyone else's.
05 · full-text search

Full-text search.

Ranked text retrieval against the same row store - no separate search cluster, no eventual-consistency window between writes and search hits.

Feature What it does
Unicode tokenization (UAX #29) Unicode-correct word segmentation. Accented Latin, Arabic, and emoji all break on the expected boundaries.
ICU tokenizer (CJK / Thai / Khmer) ICU segmenter for scripts that don't put spaces between words. Picked per (table, field) when the corpus is CJK or Southeast Asian.
Boolean queries (AND / OR / NOT) Combine clauses with the standard boolean operators. Returns the full matching set, unranked, when ranking is not needed.
BM25 scoring Standard BM25 with the canonical defaults (k1=1.2, b=0.75). Returns top-K ranked hits with the raw score per document.
Phrase queries Match the exact ordered sequence of terms. Uses position lists, so single-word matches are filtered out.
Fuzzy / typo-tolerant matching Levenshtein edit-distance matching with the term~N syntax. Edit-distance cap of 3 keeps the candidate set bounded on large indexes.
Synonyms Customer-managed synonym maps, bidirectional expansion, configurable cap per term. Synonyms apply at query time, so changes take effect without a reindex.
Stemming (9 languages) English Snowball plus stemmers for Spanish, French, German, Italian, Dutch, Swedish, Russian, and Portuguese. Reduces inflections to a shared root.
Multi-language analyzers LocaleAnalyzer with built-in stopword lists for 9 languages. One field per locale or one mixed-locale field - your call.
Per-locale stopwords config Override the built-in stopword list per (table, field). Useful when a domain term ('the', 'a') is meaningful in your corpus.
Highlighting / snippets Returns matched-snippet text per hit with configurable open/close tags. Overlapping matches are merged for clean display.
Faceted search / aggregations Group counts alongside the ranked hits, capped at 1000 buckets per facet. Supports the standard 'narrow by category' UI without a second query.
JSON document indexing Index a JSON column field-by-field. Each addressable path becomes its own searchable field; the dotted path is the search field name.
Multi-field weighted merge Search several fields in one call with per-field weights. Combine title and body, weighting title higher, in a single ranked list.

Everything above, one bearer token.

Every feature on this page runs against the same managed instance - one endpoint, one writer, one consistency model. No glue between shapes.