Definition
A metric in RAG evaluation that quantifies the semantic equivalence between a generated response and a reference ground truth, using embedding models to capture meaning rather than simple keyword matching. It assesses how well the AI agent's output aligns with the intended factual answer, accounting for architectural trade-offs where higher-precision cross-encoders provide better accuracy at the cost of increased evaluation latency.
Semantic meaning vs. lexical word overlap (like BLEU or ROUGE).
"Two different keys that both successfully open the exact same lock."
- Ground Truth(Prerequisite)
- BERTScore(Component)
- Faithfulness(Related Metric)
Conceptual Overview
A metric in RAG evaluation that quantifies the semantic equivalence between a generated response and a reference ground truth, using embedding models to capture meaning rather than simple keyword matching. It assesses how well the AI agent's output aligns with the intended factual answer, accounting for architectural trade-offs where higher-precision cross-encoders provide better accuracy at the cost of increased evaluation latency.
Disambiguation
Semantic meaning vs. lexical word overlap (like BLEU or ROUGE).
Visual Analog
Two different keys that both successfully open the exact same lock.