Overview
This recipe walks through standing up a multi-language translation pipeline that handles content ingestion, language detection, model routing, post-translation quality checks, and optional human review. The architecture is designed for high throughput and graceful degradation when downstream translation services are unavailable.
Prerequisites
- Meridian project initialized with at least one pipeline worker
- API keys for your translation backends (DeepL, Google Cloud Translation, or Azure Translator)
- Source content structured with language metadata or detectable via CLD3/fastText
Step 1 — Ingest and classify
Configure an ingestion webhook that accepts JSON payloads with a source_lang field and raw text. If the source language is not provided, attach a pre-processing step that runs language detection and stamps the detected locale before the payload enters the translation queue.
Step 2 — Route to translation models
Define a routing table mapping language pairs to specific backends. For example, routeen→ja through DeepL whileen→ar uses Google Cloud Translation. Each route specifies a timeout, retry policy, and fallback backend.
Step 3 — Quality scoring
After translation, run the output through a quality scoring step. Compute BLEU or COMET scores against reference translations if available. Flag outputs below a configurable threshold for human review. Store scores as structured metadata on the translated artifact.
Step 4 — Human review gate
For flagged translations, enqueue a review task. The reviewer sees the source text, the machine translation, and the quality score. They can approve, edit, or reject. Approved translations are published; rejected ones are re-queued with an alternate backend.
Step 5 — Delivery and caching
Deliver translated content via a CDN-backed endpoint keyed by content ID and locale. Cache translations aggressively — invalidate only when the source content changes or a reviewer updates the translation. Emit a webhook event on delivery so downstream systems can react.
Pro tip
Use a dead-letter queue for translations that fail after all retries. Periodically drain it with a manual reconciliation job rather than letting failures silently disappear.