Definition
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a metric suite used to evaluate the quality of LLM-generated summaries or RAG responses by measuring n-gram overlap against a reference ground truth. While computationally efficient, it presents an architectural trade-off by prioritizing lexical recall over semantic meaning, meaning it can penalize accurate responses that use different phrasing than the reference.
Refers to n-gram overlap metrics for text evaluation, not the color or cosmetic term.
"A Translucent Stencil: Placing a reference template over a generated answer to see how much of the original 'shape' of the text shows through the gaps."
- BLEU(Precision-focused counterpart frequently used in tandem with ROUGE.)
- N-gram(The contiguous sequence of items used as the base unit for overlap calculation.)
- Ground Truth(The prerequisite reference text required to calculate a ROUGE score.)
- METEOR(An alternative metric that addresses ROUGE's lack of synonym matching.)
Conceptual Overview
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a metric suite used to evaluate the quality of LLM-generated summaries or RAG responses by measuring n-gram overlap against a reference ground truth. While computationally efficient, it presents an architectural trade-off by prioritizing lexical recall over semantic meaning, meaning it can penalize accurate responses that use different phrasing than the reference.
Disambiguation
Refers to n-gram overlap metrics for text evaluation, not the color or cosmetic term.
Visual Analog
A Translucent Stencil: Placing a reference template over a generated answer to see how much of the original 'shape' of the text shows through the gaps.