Vector Database Design
A practical guide to schema design, index selection, and query patterns for production vector stores.
Dimensionality & Precision
Choose embedding dimension based on your model — 384 for all-MiniLM, 768 for BERT-base, 1536 for OpenAI ada-002. Prefer float32 for recall-critical workloads; use int8 quantization when throughput matters more than sub-percent accuracy loss.
Index Strategy
HNSW delivers the best latency-recall tradeoff for datasets under 10M vectors. Set M=16, efConstruction=200 for build, efSearch=64–128 for query. IVF-PQ shines above 100M vectors — partition count should target ~10K vectors per cluster.
Metadata Filtering
Pre-filter before vector search when selectivity is high. Post-filter when the metadata clause matches >80% of the corpus. Hybrid approaches — scalar indexes on tenant_id, timestamp, or tags — prevent full-scan disasters at scale.
Sharding & Multi-Tenancy
Partition by tenant for strict isolation; use collection-per-tenant for small fleets. For SaaS with thousands of tenants, a single collection with tenant_id pre-filtering avoids index sprawl. Monitor shard size — repartition when any shard exceeds 50M vectors.
Consistency & Durability
WAL-backed inserts guarantee no data loss on crash. Set replication factor ≥3 for production. Tune commit intervals — 1s for freshness, 10s for throughput. Snapshot every 10K mutations to bound recovery time.