Definition
The systematic process of evaluating RAG pipeline performance and AI agent outputs using metrics like faithfulness, relevance, and precision to minimize hallucinations and ensure grounding. It involves a trade-off between the high cost/accuracy of 'LLM-as-a-judge' or human evaluation and the speed/lower precision of heuristic-based metrics.
Distinguishes the 'Quality Assurance' testing methodology from 'Question Answering' as a functional task.
"A food safety inspector using a checklist to verify that a chef (LLM) used only the provided ingredients (retrieved context) without adding any unauthorized fillers."
- RAG Triad(Component)
- LLM-as-a-judge(Component)
- Ground Truth(Prerequisite)
- Faithfulness(Component)
Conceptual Overview
The systematic process of evaluating RAG pipeline performance and AI agent outputs using metrics like faithfulness, relevance, and precision to minimize hallucinations and ensure grounding. It involves a trade-off between the high cost/accuracy of 'LLM-as-a-judge' or human evaluation and the speed/lower precision of heuristic-based metrics.
Disambiguation
Distinguishes the 'Quality Assurance' testing methodology from 'Question Answering' as a functional task.
Visual Analog
A food safety inspector using a checklist to verify that a chef (LLM) used only the provided ingredients (retrieved context) without adding any unauthorized fillers.