2. Knowledge-base article (row + vector + FTS)
← Atomic multi-shape
Save one help-center article as three shapes - the structured row in kb.articles, an embedding of the title and body for semantic similarity, and a BM25 keyword index on the body. No graph edges - this is the simplest recipe that still covers a real RAG / search backend.
- Help-center search backends. Keyword search handles "exact phrase" queries; vector search handles "what does this user mean".
- RAG retrieval - rank candidates by vector similarity, optionally re-rank with FTS, then fetch the body from the row store.
- Any corpus where you want both lexical and semantic recall over the same documents.
Plain row schema - no [[relations]] because there's no graph edge to write.
# kb/articles.toml
namespace = "kb"
table = "articles"
primary_key = ["id"]
[[columns]]
name = "id"
ty = "str"
required = true
[[columns]]
name = "title"
ty = "str"
required = true
[[columns]]
name = "body"
ty = "str"
required = true
[[columns]]
name = "url"
ty = "str"
[[columns]]
name = "created_ms"
ty = "u64"
[[indexes]]
name = "by_created"
columns = ["created_ms"] curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/rows/kb.articles" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"id": "kb-2026-001",
"title": "How atomic multi-shape writes work",
"body": "Each shape (row, vector, FTS, graph) has its own endpoint. Every call is atomic individually. Idempotency keys make retries safe.",
"url": "/docs/concepts/atomic-multi-shape",
"created_ms": 1747900000000
}'db.rows.put("kb.articles", {
"id": "kb-2026-001",
"title": "How atomic multi-shape writes work",
"body": "Each shape (row, vector, FTS, graph) has its own endpoint. Every call is atomic individually. Idempotency keys make retries safe.",
"url": "/docs/concepts/atomic-multi-shape",
"created_ms": 1747900000000,
})// The TypeScript SDK does not wrap row writes yet
// (shipping in the next release). Use `fetch` for now.
await fetch(`${BASE_URL}/v1/tenants/${TENANT}/rows/kb.articles`, {
method: "POST",
headers: {
"Authorization": `Bearer ${OC_TOKEN}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
id: "kb-2026-001",
title: "How atomic multi-shape writes work",
body: "Each shape (row, vector, FTS, graph) has its own endpoint. Every call is atomic individually. Idempotency keys make retries safe.",
url: "/docs/concepts/atomic-multi-shape",
created_ms: 1747900000000,
}),
});// The Go SDK does not wrap row writes yet
// (shipping in the next release). Use net/http for now.
body, _ := json.Marshal(map[string]any{
"id": "kb-2026-001",
"title": "How atomic multi-shape writes work",
"body": "Each shape (row, vector, FTS, graph) has its own endpoint. Every call is atomic individually. Idempotency keys make retries safe.",
"url": "/docs/concepts/atomic-multi-shape",
"created_ms": uint64(1747900000000),
})
req, _ := http.NewRequestWithContext(ctx, "POST",
BASE_URL+"/v1/tenants/"+TENANT+"/rows/kb.articles",
bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer "+OC_TOKEN)
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close() Embed both fields together. A title-only embedding misses everything the body says, and that's where most of the meaningful tokens live.
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/vector/kb.articles/put" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"id": "kb-2026-001",
"embedding": [0.0211, -0.0612, 0.0341, /* ... 768 floats ... */],
"dim": 768,
"metric": "cosine"
}'# Embed the title and body together so semantic search hits both.
text = f"{title}\n\n{body}"
embedding_768d = embed(text) # any embedding model
db.vector.put(
"kb.articles",
"kb-2026-001",
embedding_768d,
)// Embed title + body together. embedding768d is your number[] of length 768.
await db.vectorPut("kb.articles", {
id: "kb-2026-001",
embedding: embedding768d,
dim: 768,
metric: "cosine",
});// Embed title + body together. embedding768d is your []float32 of length 768.
err := db.VectorPut(ctx, "kb.articles", originchain.VectorPutRequest{
ID: "kb-2026-001",
Embedding: embedding768d,
Dim: 768,
Metric: "cosine",
}) Index the body for BM25 keyword search. Most help-center queries are keyword-shaped ("install on Windows"), so this is the workhorse.
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/fts/kb.articles/index" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"field": "body",
"doc_id": "kb-2026-001",
"text": "Each shape (row, vector, FTS, graph) has its own endpoint. Every call is atomic individually. Idempotency keys make retries safe."
}'db.fts.index(
"kb.articles",
"body",
doc_id="kb-2026-001",
text=body,
)await db.ftsIndex("kb.articles", {
field: "body",
docId: "kb-2026-001",
text: body,
});err := db.FTSIndex(ctx, "kb.articles", originchain.FTSIndexRequest{
Field: "body",
DocID: "kb-2026-001",
Text: body,
})
The three calls are separate. There is no single "write everything" endpoint. Each call is atomic by itself. The SDKs auto-attach an Idempotency-Key on every mutating call, so if the FTS call fails after the row and vector succeeded, retry just the FTS one - re-doing the row write would not duplicate it.
- Embedding only the title. Titles carry maybe 10% of an article's meaning. Concatenate title + body before embedding so semantic similarity actually fires on body content.
- Indexing the title in FTS but not the body. The opposite mistake. The body is where the searchable keywords live.
- Forgetting to re-index on update. If you edit an article, you have to re-put the row, re-put the vector, and re-index the FTS field. None of the three rides along with the others.
- Embedding huge bodies as one vector. Past ~1k tokens, semantic similarity gets muddy. For long articles, chunk the body, write one row per chunk with a parent
article_id, and embed each chunk separately.