OriginChain docs
how-to · insert

Insert data

Saving data is the first thing you do with a database. This page walks through every way to put data into OriginChain - one row, many rows, vectors, search text, and graph relationships - with the same set of code examples for each one: cURL if you want to see the raw HTTP, Python, TypeScript, and Go.

New to OriginChain? Read Quickstart first - it shows you how to create an instance and get the URL + token used below.

0. Set up your client.

Each language needs three things to talk to OriginChain: your endpoint URL, a bearer token, and your tenant ID. Set them up once, then every example below assumes they are in scope.

setup
# Save your endpoint + token + tenant ID once.
# You get all three from the dashboard after creating an instance.

export ORIGINCHAIN_URL="https://acme.ap-south-1.db.originchain.ai"
export OC_TOKEN="oc_live_xxxxxxxxxxxxxxxx"
export T="acme"   # your tenant ID (the part before .ap-south-1...)
where do these come from?
  • Endpoint URL: Dashboard → your instance → "Connect". Looks like https://acme.ap-south-1.db.originchain.ai.
  • Bearer token: Dashboard → your instance → "API tokens" → "Create token". Starts with oc_live_. Store it in a secret manager - it grants full access to your instance.
  • Tenant ID: The first part of the endpoint hostname. For acme.ap-south-1.db... the tenant is acme.
SDK status

The Python SDK has helpers for every endpoint on this page. The TypeScript and Go SDKs cover vector, full-text, SQL, graph, and ask - but they don't have row-write helpers yet (shipping in the next release). For row writes in TypeScript and Go we show fetch / net/http calls; they hit the exact same endpoint the SDK will use.

1. Insert one row.

what this does

Save one record to a table - like one row in a spreadsheet. The record is a JSON object whose keys match the column names you declared on the schema.

when to use it
  • You are saving one new record (a user signing up, a single order).
  • You are updating an existing record - same call. Sending the same id again replaces the row.
  • If you have many rows to save, jump to Insert many rows - it is much faster.
the code
POST /v1/tenants/:t/rows/:schema
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/rows/shop.customers" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "id":     "c_1",
    "email":  "ada@example.com",
    "region": "IN"
  }'
what each field means
Field Type Required What it is
URL :t string yes Your tenant ID. The first part of your endpoint hostname.
URL :schema string yes The schema name (here, shop.customers). Must already exist - see Define a schema.
id string yes The primary key declared in your schema. Re-sending the same id replaces the row.
email, region, ... any depends Any other column you declared on the schema. The JSON key matches the column name; the JSON type must match the declared type.
Authorization header yes Bearer <your token>. Missing or wrong → 401 unauthorized.
what you get back
{ "ok": true, "lsn": { "segment": 4, "offset": 8421007 } }

ok: true means the row is saved and durable. lsn is the position in the write-ahead log where your row landed - useful if you need to wait for a replica to catch up. You can ignore it most of the time.

common mistakes
  • Schema doesn't exist yet. You will see 404 schema_not_found. Create the schema first - see Define a schema.
  • Wrong type for a column. If you declared price as a number but sent a string, you will see 400 type_mismatch. The error message names the offending field.
  • Forgot the Content-Type header. Without Content-Type: application/json the server can't parse the body and returns 400 invalid_body.
  • Did not mean to overwrite. By default a duplicate id overwrites the existing row. If you want the write to fail when the row already exists, add the query string ?expect=insert - it returns 409 conflict on duplicate.
try it yourself · 30 seconds

After setting up your instance and creating the shop.customers schema, paste this into your terminal:

curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/rows/shop.customers" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"id":"test_1","email":"you@example.com","region":"US"}'

You should see {"ok":true,...}. Run it again - same result. That confirms inserts are idempotent (re-running doesn't break anything).

2. Insert many rows.

what this does

Save many rows in one HTTP call. Far faster than calling "insert one row" in a loop, because each call has fixed network overhead.

when to use it
  • Importing a CSV / JSON file - hundreds, thousands, or millions of rows.
  • Backfilling a new table from an old one.
  • Ingesting a stream of events in micro-batches.

There are two transport shapes. A JSON array body is simple - send up to ~8 MiB of rows per call. For larger imports, NDJSON (one JSON object per line) lifts that cap and streams.

the code
POST /v1/tenants/:t/rows/:schema/_batch
# 1) JSON array body - send up to ~8 MiB worth of rows in one call.
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/rows/shop.customers/_batch" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '[
    { "id": "c_1", "email": "ada@example.com",      "region": "IN" },
    { "id": "c_2", "email": "hopper@example.com",   "region": "US" },
    { "id": "c_3", "email": "lovelace@example.com", "region": "GB" }
  ]'

# 2) NDJSON stream - for millions of rows. No 8 MiB cap.
#    One JSON object per line. The `chunk` query param controls
#    how many rows go into each atomic write.

curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/rows/shop.customers/_batch?chunk=1000" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @customers.ndjson

# customers.ndjson - one row per line:
# {"id":"c_1","email":"ada@example.com","region":"IN"}
# {"id":"c_2","email":"hopper@example.com","region":"US"}
# ...
what each field means
Field Where Default What it controls
Body JSON - An array of row objects. Up to ~8 MiB of total request body, roughly 50 000 small rows.
Body NDJSON - One JSON object per line. No size cap on the body. Use Content-Type: application/x-ndjson.
chunk query 1000 How many rows go into one atomic write. Bigger chunks = fewer fsyncs, more throughput. Smaller chunks = finer-grained retry.
expect query - ?expect=insert fails the batch if any row already exists. Useful for initial imports where duplicates are bugs.
Idempotency-Key header auto If you retry the same call with the same key, the server returns the original result without re-inserting. The SDKs set this automatically.
common mistakes
  • Sending NDJSON with the wrong Content-Type. Use application/x-ndjson, not application/json. Otherwise the server tries to parse the whole body as one JSON object and fails.
  • Chunk too big. A single chunk that doesn't fit in memory will OOM the engine's batch buffer. If you're streaming millions of rows, keep chunk at the default 1000.
  • No retry strategy. Network blips happen. The SDKs handle this for you; if you call the endpoint directly, retry on 503 and 504 with exponential backoff. The auto-set Idempotency-Key makes the retry safe.
try it yourself · 1 minute

Generate 1000 fake rows and import them in one call. Save this as demo.py and run python demo.py:

from originchain import OriginChain
db = OriginChain.from_env()
rows = ({"id": f"c_{i}", "email": f"u{i}@ex.com", "region": "IN"} for i in range(1000))
print(db.rows.put_batch("shop.customers", rows, chunk=500), "rows inserted")

3. Insert a vector.

what this does

Save a list of numbers (an embedding) under an ID so you can later find similar embeddings. An embedding is the output of a model that turned some text or an image into numbers - typically 384, 768, 1024, or 1536 of them.

when to use it
  • You are building semantic search ("find products that mean roughly the same thing").
  • You are building retrieval-augmented generation (RAG) - finding the most relevant documents to feed to an LLM.
  • You are doing recommendations based on similarity.
the code
POST /v1/tenants/:t/vector/:table/put
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/vector/shop.products/put" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "id":        "sku-9281",
    "embedding": [0.0124, -0.0883, 0.0451, /* ... 768 floats ... */],
    "dim":       768,
    "metric":    "cosine",
    "metadata":  { "category": "running-shoes", "price": 129.0 }
  }'
what each field means
Field Type Required What it is
id string yes A unique ID for this vector. Usually the primary key of the row it was extracted from (here, the SKU).
embedding float[] yes The vector itself. The length must match dim exactly.
dim int yes The vector's length. Must be the same value for every vector in this table - the first insert locks it in.
metric string no How "closeness" is measured. cosine (default) for most text models. l2 for distance-style models. dot for inner-product. Locked after the first insert.
metadata object no Anything you want to filter on at search time. Example: { "category": "shoes" } lets you later restrict the search to shoes only.
common mistakes
  • Wrong dim. If your embeddings are 1536 floats but the first insert said dim: 768, every later insert fails with 400 dim_mismatch. The first insert sets the lock.
  • Mixing metrics. If you start with cosine and later send l2, you get 400 metric_mismatch. Pick one and stay with it.
  • Filtering on un-indexed metadata. Filters work on any key, but they are fastest on simple equality (category == "shoes"). Range filters on price are slower.
tip · skip the separate call

Vectors are stored on their own endpoint, not as a row column. The id you pass here is what links the vector back to a row (typically the row's primary key). See Vector tables for the full reference.

4. Insert a search document.

what this does

Index a piece of text so it shows up in keyword search. OriginChain breaks the text into tokens, applies the analyzer you declared on the schema (lowercase, stemming, etc.), and builds an inverted index that's ranked by BM25 - the standard relevance algorithm used by Elasticsearch and Lucene.

when to use it
  • You want users to find rows by typing keywords ("carbon plate marathon shoes").
  • You want phrase search ("exact phrase in quotes").
  • You want fuzzy matching that tolerates typos.
the code
POST /v1/tenants/:t/fts/:table/:field
curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/fts/shop.products/description" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "doc_id": "sku-9281",
    "text":   "Lightweight road runner with a carbon plate, designed for marathon pace."
  }'
what each field means
Field Where Required What it is
:table URL yes The schema name. Same as in row writes.
:field URL yes The column name to index this text under. You can index the same column with different documents.
doc_id body yes A unique ID for this document. Usually the row's primary key. Re-indexing with the same doc_id replaces the previous text - no stale matches.
text body yes The actual text to index. No size limit on this endpoint, but very large documents (~MBs) are better split into multiple doc_ids.
common mistakes
  • Indexing a doc_id that doesn't match a row. The FTS index is independent of the row store - nothing stops you from indexing a doc_id that doesn't exist in the table. You'll get search hits pointing at nothing. Treat doc_id as "the primary key of the row this text describes" and you'll be fine.
  • Wrong analyzer for your language. The default English analyzer doesn't stem German or Chinese well. Declare the right analyzer on the schema (Snowball stemmers in 18 languages plus CJK / Thai / Khmer tokenizers).
  • Indexing the wrong text. The text you put here is what gets searched - if you index only the product name, users can't search by description. Concatenate every searchable field into one string before indexing.
tip · skip the separate call

Like vectors, full-text indexes live on their own runtime endpoint - they're not declared on the row schema. Re-indexing the same doc_id replaces the previous text in the same write, so there are no stale postings.

5. Insert a graph relationship.

what this does

Create a link (an edge) between two rows that you can later walk. Examples: a product is supplied by a supplier, a user follows another user, an order belongs to a customer.

Here is the important thing: an edge is not a separate write. You declare which columns are relations on the schema, then the engine creates and maintains the forward + reverse edges automatically whenever you write the row.

when to use it
  • You want to query things like "all products from supplier X" or "all orders placed by user Y" without writing a JOIN.
  • You want to do multi-hop walks like "friends of friends" or "products bought by users who bought this one".
  • You want shortest-path queries.
the code

Assuming the schema has [[relations]] column = "supplier_id" target = "shop.suppliers" declared, this row write creates the edge automatically:

POST /v1/tenants/:t/rows/:schema · edge written atomically
# A graph edge is NOT a separate write.
# Declare `[[relations]]` on the schema (see "Try it yourself" below),
# then write the row - the engine creates the forward and reverse
# edges automatically because `supplier_id` is declared as a relation.

curl -X POST "$ORIGINCHAIN_URL/v1/tenants/$T/rows/shop.products" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "id":          "sku-9281",
    "name":        "Carbon Marathon",
    "supplier_id": "sup-44",
    "price_cents": 12900
  }'
common mistakes
  • Target row doesn't exist yet. If sup-44 doesn't exist in shop.suppliers, the edge is still stored - it just points at a non-existent row. Decide whether you want this. To enforce existence, add a foreign-key constraint on the schema.
  • Forgetting that updates retire the old edge. If a product's supplier_id changes from sup-44 to sup-77, the old edge is removed in the same write. Good for accuracy, surprising if you expected history.
  • Trying to add an edge without a column for it. Edges piggy-back on columns. If you want a many-to-many relationship without a column, create a join table (e.g., shop.product_tags) and put relations on its columns.
next · walk the edges

Once edges are written, see Graph queries for how to walk them (neighbors, BFS, shortest path, PageRank).

6. All four at once (atomic).

what this does

Save the same product as a row, a vector embedding, a search document, and a graph edge - in three coordinated calls, all backed by the same record. This is the pattern most real apps end up with.

In a typical stack you'd write the row to Postgres, push the embedding to a vector database, push the text to Elasticsearch, and trust three systems to stay in sync. Here, all four projections live in the same instance and share the same write-ahead log.

when to use it
  • You are building a product catalog that needs to be searchable by exact filter, by similarity, by keyword, and by relationship - all at once.
  • You are building a RAG pipeline that also needs structured filtering.
  • You want to stop maintaining three separate databases.
the schema

One schema, all four shapes declared up front. Register this with POST /v1/tenants/$T/schemas:

# manifest.toml - the row schema. Defines columns + a graph edge.
# Vector and full-text indexes are NOT declared here - they live on
# their own runtime endpoints (see /docs/vector, /docs/fts) and link
# back to rows by primary key.

namespace   = "shop"
table       = "products"
primary_key = ["id"]

[[columns]]
name = "id"
ty   = "str"
required = true

[[columns]]
name = "name"
ty   = "str"

[[columns]]
name = "supplier_id"
ty   = "str"

[[columns]]
name = "price_cents"
ty   = "i64"            # money in minor units - never f64

[[columns]]
name = "description"
ty   = "str"

# Secondary index on supplier_id so neighbor lookups are fast.
[[indexes]]
name    = "by_supplier"
columns = ["supplier_id"]

# Turn supplier_id into a graph edge: the row write creates the edge
# automatically because [[relations]] is declared.
[[relations]]
name          = "supplied_by"
from_col      = "supplier_id"
bidirectional = true

[relations.target]
namespace = "shop"
table     = "suppliers"
pk        = "id"
the code

Save one product. Each call lands as one atomic write on the engine - if any of the three calls fails, you can safely retry the failing one because the calls are idempotent.

row + vector + full-text + supplier edge
# One product, written four ways - but it is ONE write from the
# database's perspective. If anything fails, nothing is saved.

product_id  = "sku-9281"
description = "Lightweight road runner with a carbon plate, designed for marathon pace."

# 1. The row itself. The graph edge to `shop.suppliers` is created
#    automatically because `supplier_id` is a declared relation.
db.rows.put("shop.products", {
    "id":          product_id,
    "name":        "Carbon Marathon",
    "supplier_id": "sup-44",
    "price_cents": 12900,
    "description": description,
})

# 2. The vector embedding. Your app computes the float[]; the engine stores it.
db.vector.put(
    "shop.products",
    product_id,
    embed(description),    # 768-float list
    metadata={ "category": "running-shoes", "price": 129.0 },
)

# 3. The BM25 full-text index. Re-indexing the same doc_id replaces
#    the old postings - no ghost matches.
db.fts.index(
    "shop.products",
    "description",
    doc_id=product_id,
    text=description,
)
what just happened

After those three calls, the same product is visible to four kinds of query:

  • SQL: SELECT * FROM shop.products WHERE price < 150
  • Vector: find the 10 products most similar to a query embedding
  • Full-text: find products whose description matches "marathon carbon"
  • Graph: find all products supplied by sup-44

See Querying your data for each of these.

common mistakes
  • Forgetting one of the three calls. Row, vector, and full-text are stored independently. If you insert the row but skip the embedding, the product won't show up in vector search. Wrap the three calls in your own helper function so they always go together.
  • Embedding the wrong text. Embed what you want users to search by - usually a title + description, not the SKU.
  • Ignoring failures. Each call returns success or failure. If only the vector call fails, the row and FTS index are still written. Decide whether you want to roll back manually (delete the row) or just retry the failing call.