Vector quantization.
Quantization trades a controlled amount of recall for a large reduction in memory. OriginChain ships three variants: binary (32× memory), PQ (64× memory at D=128), and IVF-PQ (residualised per-cell, 768× memory at D=1536). All three are GA. Pick one per collection at install time; the topk surface stays the same.
The trade table.
Scalar (f32) 1× (baseline) Reference - the recall every other variant trades against. Small indexes, recall-critical workloads, anything under ~1M vectors where memory isn't the binding constraint. Binary quantization 32× recall@10 typically 0.85–0.93 vs f32; rerank top-100 to recover. Memory-bound workloads. Pair with HNSW or IVF. Cheapest scoring kernel - hamming distance on packed bits. PQ (8-bit codebook) 64× at D=128 recall@10 typically 0.92–0.96 with M=16 sub-vectors. When you want compressed storage but don't need IVF partitioning. LUT scoring at query time keeps latency similar to f32. IVF-PQ (residualised) 64× at D=128, 768× at D=1536 recall@10 = 0.948 at nprobe=4 on the synthetic floor. 10M+ vectors per tenant. The scale ceiling story. IVF partitions + per-cell PQ codebook (Jegou 2011). Pick one at install.
Quantization is a per-collection property in the manifest. The topk endpoint does not change; the engine picks the scoring kernel for you. Pair binary or PQ with HNSW for default recall, or with IVF for the scale story.
# manifest.toml — attach binary quantization to an HNSW index
[[vectors]]
name = "embeddings"
dim = 768
metric = "cosine"
index = "hnsw"
quantization = "binary" # or "pq" | "none"
# Per-collection. Quantization is a memory/recall trade, picked once at install. Train a PQ codebook.
PQ needs a codebook trained on a sample of your data. The training set is
a row label that the engine sub-samples; sub-vectors (m) and
codebook entries per sub-vector (ksub)
are the two knobs. Defaults are m=16, ksub=256 for D=128.
curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/vector/embeddings/install-pq" \
-H "Authorization: Bearer $OC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"m": 16,
"ksub": 256,
"training_set": "sample"
}' Memory math.
At D=128 with f32, one vector costs 512 bytes. Binary collapses that to 16 bytes (1 bit × 128 dims). PQ with M=16 sub-vectors stores 16 codes per vector at 8 bits each = 16 bytes. IVF-PQ residualises the per-cell vector and stores 16 bytes plus the cell id; at D=1536 the residual structure lands the 768× figure quoted on the marketing surface.