Definition
BLEU (Bilingual Evaluation Understudy) is a precision-based metric that quantifies the n-gram overlap between an LLM-generated response and a reference 'ground truth' text. In RAG evaluation, it serves as a mathematical proxy for output quality by penalizing word choice deviations and short sequences, though it fails to account for semantic synonyms.
Focuses on exact word-matching precision, unlike ROUGE which focuses on recall.
"A transparency overlay that counts how many words in the generated text physically align with the reference text's blueprint."
- ROUGE(Recall-oriented sister metric)
- Brevity Penalty(Core mathematical component)
- N-gram(Prerequisite concept)
- METEOR(Semantic-aware alternative)
Conceptual Overview
BLEU (Bilingual Evaluation Understudy) is a precision-based metric that quantifies the n-gram overlap between an LLM-generated response and a reference 'ground truth' text. In RAG evaluation, it serves as a mathematical proxy for output quality by penalizing word choice deviations and short sequences, though it fails to account for semantic synonyms.
Disambiguation
Focuses on exact word-matching precision, unlike ROUGE which focuses on recall.
Visual Analog
A transparency overlay that counts how many words in the generated text physically align with the reference text's blueprint.