Insert data
Saving data is the first thing you do with a database. This page walks through every way to put data into OriginChain - one row, many rows, vectors, search text, and graph relationships - with the same set of code examples for each one: cURL if you want to see the raw HTTP, Python, TypeScript, and Go.
New to OriginChain? Read Quickstart first - it shows you how to create an instance and get the URL + token used below.
0. Set up your client.
Each language needs three things to talk to OriginChain: your endpoint URL, a bearer token, and your tenant ID. Set them up once, then every example below assumes they are in scope.
# Save your endpoint + token + tenant ID once.
# You get all three from the dashboard after creating an instance.
export ORIGINCHAIN_URL="https://acme.ap-south-1.db.originchain.ai"
export OC_TOKEN="oc_live_xxxxxxxxxxxxxxxx"
export T="acme" # your tenant ID (the part before .ap-south-1...)# pip install originchain
from originchain import OriginChain
db = OriginChain(
base_url="https://acme.ap-south-1.db.originchain.ai",
bearer="oc_live_xxxxxxxxxxxxxxxx",
tenant="acme",
)
# All examples below assume `db` is in scope.// npm install @originchain/sdk
import { OriginChainClient } from "@originchain/sdk";
const db = new OriginChainClient({
baseUrl: "https://acme.ap-south-1.db.originchain.ai",
bearer: "oc_live_xxxxxxxxxxxxxxxx",
});
// All examples below assume `db` is in scope.
// For the few endpoints the SDK does not wrap yet
// (row writes, batch writes), we also reuse:
const BASE_URL = "https://acme.ap-south-1.db.originchain.ai";
const TENANT = "acme";
const OC_TOKEN = "oc_live_xxxxxxxxxxxxxxxx";// go get github.com/originchain/sdk-go
package main
import (
"bytes"
"context"
"encoding/json"
"net/http"
"github.com/originchain/sdk-go"
)
const (
BASE_URL = "https://acme.ap-south-1.db.originchain.ai"
TENANT = "acme"
OC_TOKEN = "oc_live_xxxxxxxxxxxxxxxx"
)
var ctx = context.Background()
var db = originchain.NewClient(originchain.Config{
BaseURL: BASE_URL,
Bearer: OC_TOKEN,
})
// All examples below assume `db` and `ctx` are in scope. - Endpoint URL: Dashboard → your instance → "Connect". Looks like
https://acme.ap-south-1.db.originchain.ai. - Bearer token: Dashboard → your instance → "API tokens" → "Create token". Starts with
oc_live_. Store it in a secret manager - it grants full access to your instance. - Tenant ID: The first part of the endpoint hostname. For
acme.ap-south-1.db...the tenant isacme.
The Python SDK has helpers for every endpoint on this page. The TypeScript and Go SDKs cover vector, full-text, SQL, graph, and ask - but they don't have row-write helpers yet (shipping in the next release). For row writes in TypeScript and Go we show fetch / net/http calls; they hit the exact same endpoint the SDK will use.
1. Insert one row.
Save one record to a table - like one row in a spreadsheet. The record is a JSON object whose keys match the column names you declared on the schema.
- You are saving one new record (a user signing up, a single order).
- You are updating an existing record - same call. Sending the same
idagain replaces the row. - If you have many rows to save, jump to Insert many rows - it is much faster.
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/rows/shop.customers" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"id": "c_1",
"email": "ada@example.com",
"region": "IN"
}'db.rows.put("shop.customers", {
"id": "c_1",
"email": "ada@example.com",
"region": "IN",
})// The TypeScript SDK does not wrap row writes yet
// (shipping in the next release). Use `fetch` for now -
// it is exactly the same HTTP call the SDK will make.
await fetch(`${BASE_URL}/v1/tenants/${TENANT}/rows/shop.customers`, {
method: "POST",
headers: {
"Authorization": `Bearer ${OC_TOKEN}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
id: "c_1",
email: "ada@example.com",
region: "IN",
}),
});// The Go SDK does not wrap row writes yet
// (shipping in the next release). Use net/http for now.
body, _ := json.Marshal(map[string]any{
"id": "c_1",
"email": "ada@example.com",
"region": "IN",
})
req, _ := http.NewRequestWithContext(ctx, "POST",
BASE_URL+"/v1/tenants/"+TENANT+"/rows/shop.customers",
bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer "+OC_TOKEN)
req.Header.Set("Content-Type", "application/json")
resp, err := http.DefaultClient.Do(req)
if err != nil { /* handle */ }
defer resp.Body.Close() | Field | Type | Required | What it is |
|---|---|---|---|
URL :t | string | yes | Your tenant ID. The first part of your endpoint hostname. |
URL :schema | string | yes | The schema name (here, shop.customers). Must already exist - see Define a schema. |
| id | string | yes | The primary key declared in your schema. Re-sending the same id replaces the row. |
| email, region, ... | any | depends | Any other column you declared on the schema. The JSON key matches the column name; the JSON type must match the declared type. |
| Authorization | header | yes | Bearer <your token>. Missing or wrong → 401 unauthorized. |
{ "ok": true, "lsn": { "segment": 4, "offset": 8421007 } } ok: true means the row is saved and durable. lsn is the position in the write-ahead log where your row landed - useful if you need to wait for a replica to catch up. You can ignore it most of the time.
- Schema doesn't exist yet. You will see
404 schema_not_found. Create the schema first - see Define a schema. - Wrong type for a column. If you declared
priceas a number but sent a string, you will see400 type_mismatch. The error message names the offending field. - Forgot the Content-Type header. Without
Content-Type: application/jsonthe server can't parse the body and returns400 invalid_body. - Did not mean to overwrite. By default a duplicate
idoverwrites the existing row. If you want the write to fail when the row already exists, add the query string?expect=insert- it returns409 conflicton duplicate.
After setting up your instance and creating the shop.customers schema, paste this into your terminal:
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/rows/shop.customers" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '{"id":"test_1","email":"you@example.com","region":"US"}'
You should see {"ok":true,...}. Run it again - same result. That confirms inserts are idempotent (re-running doesn't break anything).
2. Insert many rows.
Save many rows in one HTTP call. Far faster than calling "insert one row" in a loop, because each call has fixed network overhead.
- Importing a CSV / JSON file - hundreds, thousands, or millions of rows.
- Backfilling a new table from an old one.
- Ingesting a stream of events in micro-batches.
There are two transport shapes. A JSON array body is simple - send up to ~8 MiB of rows per call. For larger imports, NDJSON (one JSON object per line) lifts that cap and streams.
# 1) JSON array body - send up to ~8 MiB worth of rows in one call.
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/rows/shop.customers/_batch" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '[
{ "id": "c_1", "email": "ada@example.com", "region": "IN" },
{ "id": "c_2", "email": "hopper@example.com", "region": "US" },
{ "id": "c_3", "email": "lovelace@example.com", "region": "GB" }
]'
# 2) NDJSON stream - for millions of rows. No 8 MiB cap.
# One JSON object per line. The `chunk` query param controls
# how many rows go into each atomic write.
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/rows/shop.customers/_batch?chunk=1000" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/x-ndjson" \
--data-binary @customers.ndjson
# customers.ndjson - one row per line:
# {"id":"c_1","email":"ada@example.com","region":"IN"}
# {"id":"c_2","email":"hopper@example.com","region":"US"}
# ...# put_batch handles the chunking, retries, and idempotency keys.
# It accepts any iterable - lists, generators, file-line iterators.
def stream_customers():
for i in range(50_000):
yield {
"id": f"c_{i}",
"email": f"user{i}@example.com",
"region": "IN",
}
inserted = db.rows.put_batch(
"shop.customers",
stream_customers(),
chunk=1000, # rows per atomic write
idempotency_key="bulk-import-2026-06-10",
)
print(f"{inserted} rows accepted")// Same caveat as a single row insert - use `fetch`.
// We pass a JSON array body.
await fetch(`${BASE_URL}/v1/tenants/${TENANT}/rows/shop.customers/_batch`, {
method: "POST",
headers: {
"Authorization": `Bearer ${OC_TOKEN}`,
"Content-Type": "application/json",
},
body: JSON.stringify([
{ id: "c_1", email: "ada@example.com", region: "IN" },
{ id: "c_2", email: "hopper@example.com", region: "US" },
{ id: "c_3", email: "lovelace@example.com", region: "GB" },
]),
});rows := []map[string]any{
{"id": "c_1", "email": "ada@example.com", "region": "IN"},
{"id": "c_2", "email": "hopper@example.com", "region": "US"},
{"id": "c_3", "email": "lovelace@example.com", "region": "GB"},
}
body, _ := json.Marshal(rows)
req, _ := http.NewRequestWithContext(ctx, "POST",
BASE_URL+"/v1/tenants/"+TENANT+"/rows/shop.customers/_batch",
bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer "+OC_TOKEN)
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close() | Field | Where | Default | What it controls |
|---|---|---|---|
| Body | JSON | - | An array of row objects. Up to ~8 MiB of total request body, roughly 50 000 small rows. |
| Body | NDJSON | - | One JSON object per line. No size cap on the body. Use Content-Type: application/x-ndjson. |
| chunk | query | 1000 | How many rows go into one atomic write. Bigger chunks = fewer fsyncs, more throughput. Smaller chunks = finer-grained retry. |
| expect | query | - | ?expect=insert fails the batch if any row already exists. Useful for initial imports where duplicates are bugs. |
| Idempotency-Key | header | auto | If you retry the same call with the same key, the server returns the original result without re-inserting. The SDKs set this automatically. |
- Sending NDJSON with the wrong Content-Type. Use
application/x-ndjson, notapplication/json. Otherwise the server tries to parse the whole body as one JSON object and fails. - Chunk too big. A single chunk that doesn't fit in memory will OOM the engine's batch buffer. If you're streaming millions of rows, keep
chunkat the default 1000. - No retry strategy. Network blips happen. The SDKs handle this for you; if you call the endpoint directly, retry on
503and504with exponential backoff. The auto-setIdempotency-Keymakes the retry safe.
Generate 1000 fake rows and import them in one call. Save this as demo.py and run python demo.py:
from originchain import OriginChain
db = OriginChain.from_env()
rows = ({"id": f"c_{i}", "email": f"u{i}@ex.com", "region": "IN"} for i in range(1000))
print(db.rows.put_batch("shop.customers", rows, chunk=500), "rows inserted") 3. Insert a vector.
Save a list of numbers (an embedding) under an ID so you can later find similar embeddings. An embedding is the output of a model that turned some text or an image into numbers - typically 384, 768, 1024, or 1536 of them.
- You are building semantic search ("find products that mean roughly the same thing").
- You are building retrieval-augmented generation (RAG) - finding the most relevant documents to feed to an LLM.
- You are doing recommendations based on similarity.
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/vector/shop.products/put" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"id": "sku-9281",
"embedding": [0.0124, -0.0883, 0.0451, /* ... 768 floats ... */],
"dim": 768,
"metric": "cosine",
"metadata": { "category": "running-shoes", "price": 129.0 }
}'# embedding_768d is your 768-float list, e.g. from an OpenAI/Cohere call.
db.vector.put(
"shop.products",
"sku-9281",
embedding_768d,
metadata={ "category": "running-shoes", "price": 129.0 },
)await db.vectorPut("shop.products", {
id: "sku-9281",
embedding: embedding768d, // number[] of length 768
dim: 768,
metric: "cosine",
metadata: { category: "running-shoes", price: 129.0 },
});err := db.VectorPut(ctx, "shop.products", originchain.VectorPutRequest{
ID: "sku-9281",
Embedding: embedding768d, // []float32 of length 768
Dim: 768,
Metric: "cosine",
Metadata: map[string]any{"category": "running-shoes", "price": 129.0},
}) | Field | Type | Required | What it is |
|---|---|---|---|
| id | string | yes | A unique ID for this vector. Usually the primary key of the row it was extracted from (here, the SKU). |
| embedding | float[] | yes | The vector itself. The length must match dim exactly. |
| dim | int | yes | The vector's length. Must be the same value for every vector in this table - the first insert locks it in. |
| metric | string | no | How "closeness" is measured. cosine (default) for most text models. l2 for distance-style models. dot for inner-product. Locked after the first insert. |
| metadata | object | no | Anything you want to filter on at search time. Example: { "category": "shoes" } lets you later restrict the search to shoes only. |
common mistakes - Wrong dim. If your embeddings are 1536 floats but the first insert said
dim: 768, every later insert fails with 400 dim_mismatch. The first insert sets the lock. - Mixing metrics. If you start with
cosine and later send l2, you get 400 metric_mismatch. Pick one and stay with it. - Filtering on un-indexed metadata. Filters work on any key, but they are fastest on simple equality (
category == "shoes"). Range filters on price are slower.
tip · skip the separate call
Vectors are stored on their own endpoint, not as a row column. The id you pass here is what links the vector back to a row (typically the row's primary key). See Vector tables for the full reference.
4. Insert a search document.
what this does
Index a piece of text so it shows up in keyword search. OriginChain breaks the text into tokens, applies the analyzer you declared on the schema (lowercase, stemming, etc.), and builds an inverted index that's ranked by BM25 - the standard relevance algorithm used by Elasticsearch and Lucene.
when to use it - You want users to find rows by typing keywords ("carbon plate marathon shoes").
- You want phrase search ("exact phrase in quotes").
- You want fuzzy matching that tolerates typos.
the code POST /v1/tenants/:t/fts/:table/:field curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/fts/shop.products/description" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"doc_id": "sku-9281",
"text": "Lightweight road runner with a carbon plate, designed for marathon pace."
}'
db.fts.index(
"shop.products",
"description",
doc_id="sku-9281",
text="Lightweight road runner with a carbon plate, designed for marathon pace.",
)
await db.ftsIndex("shop.products", "description", {
doc_id: "sku-9281",
text: "Lightweight road runner with a carbon plate, designed for marathon pace.",
});
err := db.FTSIndex(ctx, "shop.products", "description", originchain.FTSIndexRequest{
DocID: "sku-9281",
Text: "Lightweight road runner with a carbon plate, designed for marathon pace.",
})
what each field means Field Where Required What it is :table URL yes The schema name. Same as in row writes. :field URL yes The column name to index this text under. You can index the same column with different documents. doc_id body yes A unique ID for this document. Usually the row's primary key. Re-indexing with the same doc_id replaces the previous text - no stale matches. text body yes The actual text to index. No size limit on this endpoint, but very large documents (~MBs) are better split into multiple doc_ids.
common mistakes - Indexing a doc_id that doesn't match a row. The FTS index is independent of the row store - nothing stops you from indexing a
doc_id that doesn't exist in the table. You'll get search hits pointing at nothing. Treat doc_id as "the primary key of the row this text describes" and you'll be fine. - Wrong analyzer for your language. The default English analyzer doesn't stem German or Chinese well. Declare the right analyzer on the schema (Snowball stemmers in 18 languages plus CJK / Thai / Khmer tokenizers).
- Indexing the wrong text. The text you put here is what gets searched - if you index only the product name, users can't search by description. Concatenate every searchable field into one string before indexing.
tip · skip the separate call
Like vectors, full-text indexes live on their own runtime endpoint - they're not declared on the row schema. Re-indexing the same doc_id replaces the previous text in the same write, so there are no stale postings.
5. Insert a graph relationship.
what this does
Create a link (an edge) between two rows that you can later walk. Examples: a product is supplied by a supplier, a user follows another user, an order belongs to a customer.
Here is the important thing: an edge is not a separate write. You declare which columns are relations on the schema, then the engine creates and maintains the forward + reverse edges automatically whenever you write the row.
when to use it - You want to query things like "all products from supplier X" or "all orders placed by user Y" without writing a JOIN.
- You want to do multi-hop walks like "friends of friends" or "products bought by users who bought this one".
- You want shortest-path queries.
the code
Assuming the schema has [[relations]] column = "supplier_id" target = "shop.suppliers" declared, this row write creates the edge automatically:
POST /v1/tenants/:t/rows/:schema · edge written atomically # A graph edge is NOT a separate write.
# Declare `[[relations]]` on the schema (see "Try it yourself" below),
# then write the row - the engine creates the forward and reverse
# edges automatically because `supplier_id` is declared as a relation.
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/rows/shop.products" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"id": "sku-9281",
"name": "Carbon Marathon",
"supplier_id": "sup-44",
"price_cents": 12900
}'
# Same put as Section 1. The schema declares supplier_id as a
# relation pointing at shop.suppliers, so the edge is created
# automatically when the row is saved.
db.rows.put("shop.products", {
"id": "sku-9281",
"name": "Carbon Marathon",
"supplier_id": "sup-44",
"price_cents": 12900,
})
// Same row write as Section 1 - the edge is implicit.
await fetch(`${BASE_URL}/v1/tenants/${TENANT}/rows/shop.products`, {
method: "POST",
headers: {
"Authorization": `Bearer ${OC_TOKEN}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
id: "sku-9281",
name: "Carbon Marathon",
supplier_id: "sup-44",
price_cents: 12900,
}),
});
body, _ := json.Marshal(map[string]any{
"id": "sku-9281",
"name": "Carbon Marathon",
"supplier_id": "sup-44",
"price_cents": 12900,
})
req, _ := http.NewRequestWithContext(ctx, "POST",
BASE_URL+"/v1/tenants/"+TENANT+"/rows/shop.products",
bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer "+OC_TOKEN)
req.Header.Set("Content-Type", "application/json")
http.DefaultClient.Do(req)
common mistakes - Target row doesn't exist yet. If
sup-44 doesn't exist in shop.suppliers, the edge is still stored - it just points at a non-existent row. Decide whether you want this. To enforce existence, add a foreign-key constraint on the schema. - Forgetting that updates retire the old edge. If a product's
supplier_id changes from sup-44 to sup-77, the old edge is removed in the same write. Good for accuracy, surprising if you expected history. - Trying to add an edge without a column for it. Edges piggy-back on columns. If you want a many-to-many relationship without a column, create a join table (e.g.,
shop.product_tags) and put relations on its columns.
next · walk the edges
Once edges are written, see Graph queries for how to walk them (neighbors, BFS, shortest path, PageRank).
6. All four at once (atomic).
what this does
Save the same product as a row, a vector embedding, a search document, and a graph edge - in three coordinated calls, all backed by the same record. This is the pattern most real apps end up with.
In a typical stack you'd write the row to Postgres, push the embedding to a vector database, push the text to Elasticsearch, and trust three systems to stay in sync. Here, all four projections live in the same instance and share the same write-ahead log.
when to use it - You are building a product catalog that needs to be searchable by exact filter, by similarity, by keyword, and by relationship - all at once.
- You are building a RAG pipeline that also needs structured filtering.
- You want to stop maintaining three separate databases.
the schema
One schema, all four shapes declared up front. Register this with POST /v1/tenants/$T/schemas:
# manifest.toml - the row schema. Defines columns + a graph edge.
# Vector and full-text indexes are NOT declared here - they live on
# their own runtime endpoints (see /docs/vector, /docs/fts) and link
# back to rows by primary key.
namespace = "shop"
table = "products"
primary_key = ["id"]
[[columns]]
name = "id"
ty = "str"
required = true
[[columns]]
name = "name"
ty = "str"
[[columns]]
name = "supplier_id"
ty = "str"
[[columns]]
name = "price_cents"
ty = "i64" # money in minor units - never f64
[[columns]]
name = "description"
ty = "str"
# Secondary index on supplier_id so neighbor lookups are fast.
[[indexes]]
name = "by_supplier"
columns = ["supplier_id"]
# Turn supplier_id into a graph edge: the row write creates the edge
# automatically because [[relations]] is declared.
[[relations]]
name = "supplied_by"
from_col = "supplier_id"
bidirectional = true
[relations.target]
namespace = "shop"
table = "suppliers"
pk = "id"
the code
Save one product. Each call lands as one atomic write on the engine - if any of the three calls fails, you can safely retry the failing one because the calls are idempotent.
row + vector + full-text + supplier edge # One product, written four ways - but it is ONE write from the
# database's perspective. If anything fails, nothing is saved.
product_id = "sku-9281"
description = "Lightweight road runner with a carbon plate, designed for marathon pace."
# 1. The row itself. The graph edge to `shop.suppliers` is created
# automatically because `supplier_id` is a declared relation.
db.rows.put("shop.products", {
"id": product_id,
"name": "Carbon Marathon",
"supplier_id": "sup-44",
"price_cents": 12900,
"description": description,
})
# 2. The vector embedding. Your app computes the float[]; the engine stores it.
db.vector.put(
"shop.products",
product_id,
embed(description), # 768-float list
metadata={ "category": "running-shoes", "price": 129.0 },
)
# 3. The BM25 full-text index. Re-indexing the same doc_id replaces
# the old postings - no ghost matches.
db.fts.index(
"shop.products",
"description",
doc_id=product_id,
text=description,
)
const productId = "sku-9281";
const description = "Lightweight road runner with a carbon plate, designed for marathon pace.";
// 1. The row + graph edge (raw fetch until row helpers ship).
await fetch(`${BASE_URL}/v1/tenants/${TENANT}/rows/shop.products`, {
method: "POST",
headers: {
"Authorization": `Bearer ${OC_TOKEN}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
id: productId,
name: "Carbon Marathon",
supplier_id: "sup-44",
price_cents: 12900,
description,
}),
});
// 2. The vector embedding.
await db.vectorPut("shop.products", {
id: productId,
embedding: await embed(description), // number[] of length 768
dim: 768,
metric: "cosine",
metadata: { category: "running-shoes", price: 129.0 },
});
// 3. The BM25 full-text index.
await db.ftsIndex("shop.products", "description", {
doc_id: productId,
text: description,
});
productId := "sku-9281"
description := "Lightweight road runner with a carbon plate, designed for marathon pace."
// 1. The row + graph edge (raw http until row helpers ship).
rowBody, _ := json.Marshal(map[string]any{
"id": productId,
"name": "Carbon Marathon",
"supplier_id": "sup-44",
"price_cents": 12900,
"description": description,
})
req, _ := http.NewRequestWithContext(ctx, "POST",
BASE_URL+"/v1/tenants/"+TENANT+"/rows/shop.products",
bytes.NewReader(rowBody))
req.Header.Set("Authorization", "Bearer "+OC_TOKEN)
req.Header.Set("Content-Type", "application/json")
http.DefaultClient.Do(req)
// 2. The vector embedding.
db.VectorPut(ctx, "shop.products", originchain.VectorPutRequest{
ID: productId,
Embedding: embed(description), // []float32 of length 768
Dim: 768,
Metric: "cosine",
Metadata: map[string]any{"category": "running-shoes", "price": 129.0},
})
// 3. The BM25 full-text index.
db.FTSIndex(ctx, "shop.products", "description", originchain.FTSIndexRequest{
DocID: productId,
Text: description,
})
what just happened
After those three calls, the same product is visible to four kinds of query:
- SQL:
SELECT * FROM shop.products WHERE price < 150 - Vector: find the 10 products most similar to a query embedding
- Full-text: find products whose description matches "marathon carbon"
- Graph: find all products supplied by
sup-44
See Querying your data for each of these.
common mistakes - Forgetting one of the three calls. Row, vector, and full-text are stored independently. If you insert the row but skip the embedding, the product won't show up in vector search. Wrap the three calls in your own helper function so they always go together.
- Embedding the wrong text. Embed what you want users to search by - usually a title + description, not the SKU.
- Ignoring failures. Each call returns success or failure. If only the vector call fails, the row and FTS index are still written. Decide whether you want to roll back manually (delete the row) or just retry the failing call.