Recipe

Recipe: Multi-language translation pipeline

Build a production-grade translation workflow that ingests source content, routes it through language-specific models, and delivers localized output with quality scoring and human-in-the-loop review gates.

Difficulty: IntermediateTime: ~45 minUpdated 2 weeks ago

Overview

This recipe walks through standing up a multi-language translation pipeline that handles content ingestion, language detection, model routing, post-translation quality checks, and optional human review. The architecture is designed for high throughput and graceful degradation when downstream translation services are unavailable.

Prerequisites

Meridian project initialized with at least one pipeline worker
API keys for your translation backends (DeepL, Google Cloud Translation, or Azure Translator)
Source content structured with language metadata or detectable via CLD3/fastText

Step 1 — Ingest and classify

Configure an ingestion webhook that accepts JSON payloads with a source_lang field and raw text. If the source language is not provided, attach a pre-processing step that runs language detection and stamps the detected locale before the payload enters the translation queue.

Step 2 — Route to translation models

Define a routing table mapping language pairs to specific backends. For example, routeen→ja through DeepL whileen→ar uses Google Cloud Translation. Each route specifies a timeout, retry policy, and fallback backend.

Step 3 — Quality scoring

After translation, run the output through a quality scoring step. Compute BLEU or COMET scores against reference translations if available. Flag outputs below a configurable threshold for human review. Store scores as structured metadata on the translated artifact.

Step 4 — Human review gate

For flagged translations, enqueue a review task. The reviewer sees the source text, the machine translation, and the quality score. They can approve, edit, or reject. Approved translations are published; rejected ones are re-queued with an alternate backend.

Step 5 — Delivery and caching

Deliver translated content via a CDN-backed endpoint keyed by content ID and locale. Cache translations aggressively — invalidate only when the source content changes or a reviewer updates the translation. Emit a webhook event on delivery so downstream systems can react.

Pro tip

Use a dead-letter queue for translations that fail after all retries. Periodically drain it with a manual reconciliation job rather than letting failures silently disappear.