ROUGE Score

ROUGE Score

A metric measuring the lexical overlap of n-grams between an LLM-generated response and a reference ground-truth text, specifically emphasizing recall to evaluate how much information was successfully retrieved and synthesized. While efficient for benchmarking, it involves a trade-off where it favors exact word matching over semantic meaning, potentially penalizing factually correct answers that use different phrasing.

Definition

Disambiguation

Focuses on 'Recall' (how much ground truth was captured), whereas BLEU focuses on 'Precision' (how much generated text is valid).

Visual Metaphor

"A highlighter overlay: placing a transparent sheet of the model's answer over the reference text to see how much of the original 'gold' text is covered."

Key Tools

Hugging Face Evaluaterouge-score (Python package)RAGASDeepEvalLangSmith

Related Connections

N-gram(Component)
Ground Truth(Prerequisite)
BLEU Score(Alternative Metric)
METEOR(Semantic Alternative)

Conceptual Overview

Disambiguation

Focuses on 'Recall' (how much ground truth was captured), whereas BLEU focuses on 'Precision' (how much generated text is valid).

Visual Analog

A highlighter overlay: placing a transparent sheet of the model's answer over the reference text to see how much of the original 'gold' text is covered.

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles