Back to Learn
Intermediate

ROUGE

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a metric suite used to evaluate the quality of LLM-generated summaries or RAG responses by measuring n-gram overlap against a reference ground truth. While computationally efficient, it presents an architectural trade-off by prioritizing lexical recall over semantic meaning, meaning it can penalize accurate responses that use different phrasing than the reference.

Definition

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a metric suite used to evaluate the quality of LLM-generated summaries or RAG responses by measuring n-gram overlap against a reference ground truth. While computationally efficient, it presents an architectural trade-off by prioritizing lexical recall over semantic meaning, meaning it can penalize accurate responses that use different phrasing than the reference.

Disambiguation

Refers to n-gram overlap metrics for text evaluation, not the color or cosmetic term.

Visual Metaphor

"A Translucent Stencil: Placing a reference template over a generated answer to see how much of the original 'shape' of the text shows through the gaps."

Conceptual Overview

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a metric suite used to evaluate the quality of LLM-generated summaries or RAG responses by measuring n-gram overlap against a reference ground truth. While computationally efficient, it presents an architectural trade-off by prioritizing lexical recall over semantic meaning, meaning it can penalize accurate responses that use different phrasing than the reference.

Disambiguation

Refers to n-gram overlap metrics for text evaluation, not the color or cosmetic term.

Visual Analog

A Translucent Stencil: Placing a reference template over a generated answer to see how much of the original 'shape' of the text shows through the gaps.

Related Articles