Recipe
Recipe: RAG evaluation metrics (faithfulness/relevance)
Measure retrieval-augmented generation quality with RAGAS — faithfulness, answer relevancy, and context precision.
Prerequisites
- Python 3.10+ with
ragasinstalled - OpenAI API key exported as
OPENAI_API_KEY - Dataset with
question,answer,contextscolumns
Step 1 — Load dataset
from datasets import Dataset
data = Dataset.from_dict({
"question": ["What is Meridian?"],
"answer": ["Meridian is a commercial DRM loader."],
"contexts": [["Meridian docs: DRM loader with Ed25519."]],
"ground_truth": ["Meridian is a DRM loader."]
})Step 2 — Run evaluation
from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy
result = evaluate(data, metrics=[faithfulness, answer_relevancy])
print(result)Expected output
{'faithfulness': 0.9500, 'answer_relevancy': 0.8721}Faithfulness ≥ 0.90 and relevancy ≥ 0.80 indicate production-ready RAG quality.