OriginChain docs
examples · atomic · 2 / 5

2. Knowledge-base article (row + vector + FTS)

← Atomic multi-shape
what this does

Save one help-center article as three shapes - the structured row in kb.articles, an embedding of the title and body for semantic similarity, and a BM25 keyword index on the body. No graph edges - this is the simplest recipe that still covers a real RAG / search backend.

when to use it
  • Help-center search backends. Keyword search handles "exact phrase" queries; vector search handles "what does this user mean".
  • RAG retrieval - rank candidates by vector similarity, optionally re-rank with FTS, then fetch the body from the row store.
  • Any corpus where you want both lexical and semantic recall over the same documents.
the schema

Plain row schema - no [[relations]] because there's no graph edge to write.

# kb/articles.toml
namespace   = "kb"
table       = "articles"
primary_key = ["id"]

[[columns]]
name = "id"
ty   = "str"
required = true

[[columns]]
name = "title"
ty   = "str"
required = true

[[columns]]
name = "body"
ty   = "str"
required = true

[[columns]]
name = "url"
ty   = "str"

[[columns]]
name = "created_ms"
ty   = "u64"

[[indexes]]
name    = "by_created"
columns = ["created_ms"]
call 1 of 3 - the row
POST /v1/tenants/:t/rows/kb.articles
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/rows/kb.articles" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "id":         "kb-2026-001",
    "title":      "How atomic multi-shape writes work",
    "body":       "Each shape (row, vector, FTS, graph) has its own endpoint. Every call is atomic individually. Idempotency keys make retries safe.",
    "url":        "/docs/concepts/atomic-multi-shape",
    "created_ms": 1747900000000
  }'
call 2 of 3 - the embedding (title + body concatenated)

Embed both fields together. A title-only embedding misses everything the body says, and that's where most of the meaningful tokens live.

POST /v1/tenants/:t/vector/kb.articles/put
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/vector/kb.articles/put" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "id":        "kb-2026-001",
    "embedding": [0.0211, -0.0612, 0.0341, /* ... 768 floats ... */],
    "dim":       768,
    "metric":    "cosine"
  }'
call 3 of 3 - the keyword index (body only)

Index the body for BM25 keyword search. Most help-center queries are keyword-shaped ("install on Windows"), so this is the workhorse.

POST /v1/tenants/:t/fts/kb.articles/index
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/fts/kb.articles/index" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "field":  "body",
    "doc_id": "kb-2026-001",
    "text":   "Each shape (row, vector, FTS, graph) has its own endpoint. Every call is atomic individually. Idempotency keys make retries safe."
  }'
about atomicity

The three calls are separate. There is no single "write everything" endpoint. Each call is atomic by itself. The SDKs auto-attach an Idempotency-Key on every mutating call, so if the FTS call fails after the row and vector succeeded, retry just the FTS one - re-doing the row write would not duplicate it.

common mistakes
  • Embedding only the title. Titles carry maybe 10% of an article's meaning. Concatenate title + body before embedding so semantic similarity actually fires on body content.
  • Indexing the title in FTS but not the body. The opposite mistake. The body is where the searchable keywords live.
  • Forgetting to re-index on update. If you edit an article, you have to re-put the row, re-put the vector, and re-index the FTS field. None of the three rides along with the others.
  • Embedding huge bodies as one vector. Past ~1k tokens, semantic similarity gets muddy. For long articles, chunk the body, write one row per chunk with a parent article_id, and embed each chunk separately.