BLEU

BLEU

BLEU (Bilingual Evaluation Understudy) is a precision-based metric that quantifies the n-gram overlap between an LLM-generated response and a reference 'ground truth' text. In RAG evaluation, it serves as a mathematical proxy for output quality by penalizing word choice deviations and short sequences, though it fails to account for semantic synonyms.

Definition

Disambiguation

Focuses on exact word-matching precision, unlike ROUGE which focuses on recall.

Visual Metaphor

"A transparency overlay that counts how many words in the generated text physically align with the reference text's blueprint."

Key Tools

SacreBLEUNLTKHugging Face EvaluateRagas

Related Connections

ROUGE(Recall-oriented sister metric)
Brevity Penalty(Core mathematical component)
N-gram(Prerequisite concept)
METEOR(Semantic-aware alternative)

Conceptual Overview

Disambiguation

Focuses on exact word-matching precision, unlike ROUGE which focuses on recall.

Visual Analog

A transparency overlay that counts how many words in the generated text physically align with the reference text's blueprint.

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles