docs · vector · IVF

IVF vector index.

IVF partitions your vectors into cells by their nearest centroid. At query time, the engine picks the nprobe closest cells to the query vector and scans only those. The result is the path to 10M+ vectors per tenant on the standard tiers — and after the bulk-load gate landed, 1M IVF builds in 80 s (was timing out at > 18 min).

The IVF pipeline.

Train — k-means picks nlist centroids on a training sample. sqrt(N) is a reasonable starting point for nlist.
Assign — every vector is assigned to its nearest centroid. Bulk-load runs this in parallel; 1M vectors in 80 s after the Gate #4 work.
Query — the query vector is scored against the centroids; only the nprobe closest cells are scanned.

Declare IVF in the manifest.

# manifest.toml — IVF on 768-dim embeddings
[[vectors]]
name        = "embeddings"
dim         = 768
metric      = "cosine"
index       = "ivf"
nlist       = 1024        # number of inverted-file cells. sqrt(N) is a good start.
# IVF is independent of quantization. Add quantization = "pq" for IVF-PQ.

Install centroids (admin).

Training is an HTTP admin call. Run it once when the corpus reaches training scale; re-run only if the data distribution shifts materially.

POST /v1/tenants/:t/vector/:table/install-centroids

curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/vector/embeddings/install-centroids" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "nlist":        1024,
    "training_set": "sample",
    "iters":        25
  }'

The nprobe knob.

nprobe is the recall / latency dial. Higher = better recall, more cells scanned. The bands below are from a synthetic 1M-vector eval at D=128, M=16, nlist=1024 — your numbers will move with corpus geometry.

nproberecall@10latency band

1 0.71 p99 lowest. Cheap scan.

4 0.93 Default. Production sweet spot.

16 0.98 Recall-critical reads. 4× slower than nprobe=4.

64 0.995 Near-exhaustive. Quality benchmarks only.

POST /v1/tenants/:t/vector/:table/topk

curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/vector/embeddings/topk" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query":  [/* 768 floats */],
    "k":      10,
    "nprobe": 4
  }'

When to pick IVF over HNSW.

Working set is above ~5M vectors and HNSW memory is starting to bind.
You want to combine with PQ (see IVF-PQ) for the 64–768× memory story.
Workload is bulk-load heavy. The IVF bulk-load path lands 1M vectors in 80 s.
Recall budget can tolerate the nprobe trade — the default nprobe=4 hits recall@10 ≈ 0.93 on the synthetic floor.