02 · core concepts

Core concepts.

OriginChain is a single managed engine with a Plan tree on top. SQL, vector, full-text, and graph are not separate engines - they are different query modes and different Plan operators over the same store. Understand the engine, the schemas, and the Plan and the rest of the surface follows.

01 · the engine

A single managed engine.

The engine is fronted by a write-ahead log. Every write is appended to the log, fsynced, then applied. Reads go through a process-wide page cache. There is no row-store / column-store / vector-engine split - every query mode is a different way to read the same row.

Each tenant gets a single, region-isolated managed instance. No shared compute, no noisy neighbour. Writes go to one primary; a sync follower replicates in lockstep on Tier 2 and above for RPO=0. The follower bootstraps from a snapshot transfer and then tails.

Fig. - request path: HTTP → plan → engine → follower + backup

02 · schemas

Declared in TOML.

A schema manifest declares the table's namespace + name, primary key, columns, secondary indexes, graph relations to other tables, and derived JSON extractions. The catalog is itself stored as rows - adding a column is a write, not a downtime migration.

# schemas/orders.toml
namespace   = "shop"
table       = "orders"
primary_key = ["id"]

[[columns]]
name = "id"
ty   = "str"      # ULIDs / UUIDs travel as text
required = true

[[columns]]
name = "customer"
ty   = "str"

[[columns]]
name = "amount_cents"
ty   = "i64"      # money in minor units - never f64

[[columns]]
name = "status"
ty   = "str"

[[columns]]
name = "placed_ms"
ty   = "u64"      # epoch milliseconds

[[indexes]]
name    = "by_status"
columns = ["status"]

[[indexes]]
name    = "by_customer_placed"
columns = ["customer", "placed_ms"]

[[relations]]
name          = "by_customer"
from_col      = "customer"
bidirectional = true

[relations.target]
namespace = "shop"
table     = "customers"
pk        = "id"

Six column types only - str, i64, u64, f64, bool, bytes. Vector and full-text indexes are NOT declared here; they live on their own runtime endpoints (see vector, fts) and link back to rows by primary key. Indexes and relations are honoured at write time - no separate "build index" step. See schemas reference for the full grammar.

03 · query modes

How data shapes work.

A single row is reachable through several query modes - SQL, secondary index, relation walk, full-text, and vector - and the engine keeps all of them in lockstep on every write. Each mode is a different way to read the same row, not a different store.

Mode	Purpose
Rows	The primary user-facing record. PK is one or more columns (ULIDs / UUIDs travel as str). Read via the typed /rows API or SQL.
Secondary indexes	Speed up equality filters and left-prefix range scans on declared columns. Maintained automatically on every write.
Relations	Graph edges between rows. Forward and reverse traversal are both O(degree) - declare a relation in the manifest and walk it with neighbors / BFS / Dijkstra.
Full-text	BM25 inverted index. Stored on a separate runtime endpoint - index text under (table, field, doc_id), then search by query string. Boolean, BM25, phrase, fuzzy modes.
Vector	Embeddings indexed with HNSW (default), IVF, or IVF-PQ. Stored on a separate runtime endpoint - put one vector per row by primary key, then top-k by similarity. Optional metadata filter.
Plan cache	Compiled Plan tree for a /ask question template. Skips the rule-grammar and LLM compile on cache hit; replays the tree through the executor.

04 · the plan tree

Eleven operators, one tree.

Both /sql and /ask compile to the same Plan tree. The tree is JSON-serialisable, cached by question hash, and replayable. Every shipped query shape is one of these operators or a composition.

Scan

Full scan of a table. The fallback when no index applies.

ColumnScan

Projection-aware scan that decodes only the requested fields.

IndexScan

Indexed lookup. Used when WHERE has an indexed equality.

Filter

Predicate evaluation. Pushed under projection where possible by the optimiser.

Project

Column selection - drops fields the user did not ask for before they reach the wire.

Limit

Truncates the stream. Pushed below sort when the sort key admits a top-K shortcut.

Sort

External-merge sort with spill-to-disk. Exposed via /ask; ORDER BY through the SQL translator is on the roadmap.

Aggregate

GROUP BY with COUNT / SUM / AVG / MIN / MAX.

HashJoin

Build-side hash table on the smaller input, probe with the larger. INNER joins. Up to 32 tables per query.

OuterJoin

LEFT, RIGHT, and FULL variants - emits NULL-filled rows for unmatched probe entries. Up to 32-table left-deep chains.

RelationHop

Walks forward or reverse edges. Powers neighbours, BFS, path, and Dijkstra.

-- SELECT c.name, SUM(o.amount_cents) AS total
--   FROM shop.orders o
--   JOIN shop.customers c ON c.id = o.customer
--  WHERE o.status = 'paid'
--  GROUP BY c.name
--  LIMIT 100;

Limit { 100 }
└── Aggregate { group: [c.name], agg: SUM(o.amount_cents) AS total }
    └── Project { c.name, o.amount_cents }
        └── HashJoin { o.customer = c.id }
            ├── IndexScan { shop.orders, status = "paid" }
            └── Scan { shop.customers }

05 · replication model

Active-passive, sync.

One primary, one optional sync follower. The log replicates before the primary returns 200. A follower joining a running cluster bootstraps via a snapshot transfer - a chunked transfer of the full store - then tails the live stream from the snapshot's LSN.

Mode	Tiers	RPO	RTO	Notes
Primary only	Tier 1	~0.5s (commit fsync)	~5-10 min (archive restore)	Single AZ, no follower. Restore replays the continuous backup archive.
Sync follower	Tier 2, Tier 3, Enterprise	0	~25s (drilled)	Multi-AZ failover. Verified end-to-end with snapshot bootstrap. Tier 2 has 1 follower; Tier 3 has 2.

On Tier 2 and above, active-passive sync replication is the production path. Commits durably ack only after the follower has the frame on disk - RPO=0, RTO ~25 s. See ops → failover for the promotion procedure.

06 · versioning

Single-row optimistic CAS.

Every row carries an internal _oc_row_version field. The API exposes put_row_cas, get_row_versioned, and delete_row_cas for optimistic concurrency. A CAS that loses the race fails the entire batch with a deterministic error - no partial application. Idempotency keys make retries safe; the same key plus the same body returns the original response, a different body with the same key returns 409.

07 · backups & pitr

Continuous backup archive.

Two streams flow to the archive in parallel: durable checkpoints shipped on roll, and a continuous backup stream that flushes the open log every few hundred milliseconds. Restore-to-timestamp resolves to sub-second precision on the paid tier.

# restore an instance to a wall-clock timestamp
oc-pitr restore \
  --tenant acme \
  --target "2026-04-29T18:42:00Z" \
  --into   acme-restore-001

Continuous backups (segment-boundary granularity, ~5–10 min restore window) are included on every tier. Sub-second precise PITR (~0.5–1.5 s data-loss window) is a paid add-on - see pricing.