Recipe: Vector store architecture
Embedding pipeline, chunking strategy, and nearest-neighbor retrieval for semantic search at scale.
Ingestion pipeline
Documents flow through a three-stage pipeline: raw text extraction, recursive character-text splitting with 512-token chunks and 64-token overlap, then batch embedding via an OpenAI-compatible endpoint. Embeddings land in a persistent vector index keyed by document hash to enable incremental upserts without full re-indexing.
Chunking strategy
Recursive splitting respects paragraph and sentence boundaries before falling back to token-level cuts. Each chunk carries metadata: source path, heading hierarchy, and a rolling content hash for deduplication. Overlap windows preserve cross-boundary context so retrieval never misses a concept split across chunk edges.
Index & retrieval
Cosine-similarity search over an HNSW graph delivers sub-10ms latency for top-k queries. A two-phase retrieval pipeline first fetches candidate chunks via vector distance, then re-ranks them with a lightweight cross-encoder for relevance. Results stream back with source citations and confidence scores.
Operational notes
Embedding dimensions fixed at 1536. Index shards by document namespace for horizontal scaling. Stale chunks expire after a configurable TTL with lazy eviction on read. All vector operations are idempotent — replay the same document and the index converges to the same state.