← Back to Docs
Recipe

Translation quality evaluator

Build a side-by-side comparison tool that scores translation fidelity using semantic similarity and human-readable metrics.

Overview

This recipe walks through constructing an evaluation pipeline that ingests source text, a candidate translation, and a reference translation, then emits a composite quality score. The output includes lexical overlap, embedding distance, and fluency heuristics — all surfaced in a clean dashboard panel.

Ingredients

  • Source text corpus (plaintext or JSONL)
  • Candidate translations from your model
  • Reference translations (human or gold-standard)
  • Sentence-transformers embedding model
  • Scoring module: BLEU, chrF, cosine similarity
  • Results table with sortable columns

Steps

  1. Load data. Parse source, candidate, and reference files into aligned records keyed by segment ID.
  2. Embed. Encode all three text columns with a multilingual sentence-transformer. Store vectors in-memory.
  3. Score. Compute BLEU and chrF against the reference. Derive cosine similarity between candidate and reference embeddings.
  4. Aggregate. Normalize scores to 0–100. Weight lexical and semantic components equally for the composite metric.
  5. Render. Display a sortable table with per-segment scores and a summary bar chart of the distribution.

Expected output

A dashboard panel showing mean composite score, a histogram of score buckets, and a searchable segment table. Each row highlights low-scoring translations in pink for quick triage.

Need the full implementation with embedding calls and scoring math? Browse the recipes index for the complete notebook.

Meridian Docs · Recipe reference · Last updated June 2026