← DocsMeridian
Evaluation

Chatbot Eval Dashboard

Measure recipe chatbot quality across accuracy, relevance, and safety dimensions with automated scoring pipelines.

94.2%
Accuracy
Ingredient substitution correctness
89.7%
Relevance
Context-aware recipe matching
98.1%
Safety
Allergen warning compliance

Recent Eval Runs

eval-042
2026-05-26
91.3%pass
eval-041
2026-05-25
90.8%pass
eval-040
2026-05-24
88.2%warn

Metrics Breakdown

BLEU Score0.87
ROUGE-L0.91
BERTScore F10.89
Human Eval Agreement96%