docs · vector · quantization

Vector quantization.

Quantization trades a controlled amount of recall for a large reduction in memory. OriginChain ships three variants: binary (32× memory), PQ (64× memory at D=128), and IVF-PQ (residualised per-cell, 768× memory at D=1536). All three are GA. Pick one per collection at install time; the topk surface stays the same.

The trade table.

kindmemoryrecall bandwhen to pick

Scalar (f32) 1× (baseline) Reference - the recall every other variant trades against. Small indexes, recall-critical workloads, anything under ~1M vectors where memory isn't the binding constraint.

Binary quantization 32× recall@10 typically 0.85–0.93 vs f32; rerank top-100 to recover. Memory-bound workloads. Pair with HNSW or IVF. Cheapest scoring kernel - hamming distance on packed bits.

PQ (8-bit codebook) 64× at D=128 recall@10 typically 0.92–0.96 with M=16 sub-vectors. When you want compressed storage but don't need IVF partitioning. LUT scoring at query time keeps latency similar to f32.

IVF-PQ (residualised) 64× at D=128, 768× at D=1536 recall@10 = 0.948 at nprobe=4 on the synthetic floor. 10M+ vectors per tenant. The scale ceiling story. IVF partitions + per-cell PQ codebook (Jegou 2011).

Pick one at install.

Quantization is a per-collection property in the manifest. The topk endpoint does not change; the engine picks the scoring kernel for you. Pair binary or PQ with HNSW for default recall, or with IVF for the scale story.

# manifest.toml — attach binary quantization to an HNSW index
[[vectors]]
name        = "embeddings"
dim         = 768
metric      = "cosine"
index       = "hnsw"
quantization = "binary"        # or "pq" | "none"
# Per-collection. Quantization is a memory/recall trade, picked once at install.

Train a PQ codebook.

PQ needs a codebook trained on a sample of your data. The training set is a row label that the engine sub-samples; sub-vectors (m) and codebook entries per sub-vector (ksub) are the two knobs. Defaults are m=16, ksub=256 for D=128.

POST /v1/tenants/:t/vector/:table/install-pq

curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/vector/embeddings/install-pq" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "m":            16,
    "ksub":         256,
    "training_set": "sample"
  }'

Memory math.

At D=128 with f32, one vector costs 512 bytes. Binary collapses that to 16 bytes (1 bit × 128 dims). PQ with M=16 sub-vectors stores 16 codes per vector at 8 bits each = 16 bytes. IVF-PQ residualises the per-cell vector and stores 16 bytes plus the cell id; at D=1536 the residual structure lands the 768× figure quoted on the marketing surface.