Recipe: HyDE
Hypothetical Document Embeddings
What It Is
HyDE generates a synthetic “ideal” document from a user query, then retrieves real documents whose embeddings are nearest to that hypothetical document. This bridges the gap between short queries and the richer language of stored content.
When to Use
- Short, ambiguous user queries that lack context.
- Corpora where documents are long-form and dense (papers, legal, medical).
- When zero-shot retrieval accuracy is too low and you cannot fine-tune an embedding model.
High-Level Flow
1. User query → “What are the side effects of drug X?”
2. LLM generates a hypothetical document answering that question.
3. Embed the hypothetical document.
4. Vector search using that embedding against your real document store.
5. Return top-k real documents to the user or downstream LLM.
Tradeoffs
HyDE adds latency and LLM cost because every query requires an extra generation step. The quality of retrieval depends heavily on the LLM’s ability to produce a plausible hypothetical document. For high-throughput systems, consider caching frequent query patterns or using a smaller model for the generation step.