← DocsMeridian
Evaluation
Chatbot Eval Dashboard
Measure recipe chatbot quality across accuracy, relevance, and safety dimensions with automated scoring pipelines.
94.2%
Accuracy
Ingredient substitution correctness
89.7%
Relevance
Context-aware recipe matching
98.1%
Safety
Allergen warning compliance
Recent Eval Runs
eval-042
2026-05-26
91.3%pass
eval-041
2026-05-25
90.8%pass
eval-040
2026-05-24
88.2%warn
Metrics Breakdown
BLEU Score0.87
ROUGE-L0.91
BERTScore F10.89
Human Eval Agreement96%