Insight Pipeline
How raw recipe data flows through Meridian's extraction, enrichment, and indexing layers to power semantic search.
Ingestion
Recipes arrive via user submission, URL import, or bulk upload. Each source is normalized to a canonical JSON schema before entering the pipeline. Duplicate detection runs on title + ingredient fingerprint to prevent re-processing.
Extraction
A structured LLM call extracts ingredients, quantities, units, steps, timing, and equipment from free-text instructions. Confidence scores are attached to every field. Low-confidence extractions are queued for human review.
Enrichment
Ingredients are linked to a nutritional database for macro estimates. Dietary tags (vegan, gluten-free, keto) are inferred from ingredient sets. Cuisine classification uses a fine-tuned embedding model. All enrichments are stored as versioned annotations so upstream data corrections propagate cleanly.
Indexing
Finalized documents are embedded and written to a vector store. Hybrid indexes combine dense embeddings with sparse keyword inverted lists. Real-time updates are pushed via a change-data- capture stream so search results reflect edits within seconds.
Observability
Every pipeline stage emits structured logs and latency metrics. A dead-letter queue catches permanent failures for operator triage. Throughput is gated by a token-bucket rate limiter to keep LLM costs predictable under burst load.
This pipeline processes approximately 12,000 recipes per hour at steady state. Peak throughput during bulk imports reaches 40,000/hr with automatic horizontal scaling of extraction workers.