Definition
A rigid evaluation metric in RAG systems that calculates the percentage of model-generated responses that are character-for-character identical to the ground truth reference. It is primarily used for extractive QA and fact-based retrieval where any deviation from the target string is considered a failure.
Binary string equality, not semantic similarity or vector distance.
"A Key and a Deadbolt: The key either fits the lock perfectly and turns, or it is completely useless; there is no 'partial' opening."
- F1 Score(Complementary metric providing partial credit for overlap)
- Ground Truth(Prerequisite reference data)
- Normalization(Prerequisite step to remove whitespace and punctuation before comparison)
- Semantic Similarity(Alternative soft-matching approach)
Conceptual Overview
A rigid evaluation metric in RAG systems that calculates the percentage of model-generated responses that are character-for-character identical to the ground truth reference. It is primarily used for extractive QA and fact-based retrieval where any deviation from the target string is considered a failure.
Disambiguation
Binary string equality, not semantic similarity or vector distance.
Visual Analog
A Key and a Deadbolt: The key either fits the lock perfectly and turns, or it is completely useless; there is no 'partial' opening.