Definition
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a metric suite used to evaluate the quality of LLM-generated summaries or RAG responses by measuring n-gram overlap against a reference ground truth. While computationally efficient, it presents an architectural trade-off by prioritizing lexical recall over semantic meaning, meaning it can penalize accurate responses that use different phrasing than the reference.
Refers to n-gram overlap metrics for text evaluation, not the color or cosmetic term.
"A Translucent Stencil: Placing a reference template over a generated answer to see how much of the original 'shape' of the text shows through the gaps."
Conceptual Overview
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a metric suite used to evaluate the quality of LLM-generated summaries or RAG responses by measuring n-gram overlap against a reference ground truth. While computationally efficient, it presents an architectural trade-off by prioritizing lexical recall over semantic meaning, meaning it can penalize accurate responses that use different phrasing than the reference.
Disambiguation
Refers to n-gram overlap metrics for text evaluation, not the color or cosmetic term.
Visual Analog
A Translucent Stencil: Placing a reference template over a generated answer to see how much of the original 'shape' of the text shows through the gaps.