BLEU Score

Definition

A metric used in RAG evaluation to quantify the lexical overlap between an LLM-generated response and a human-provided ground truth by calculating n-gram precision. While efficient, it serves as a measure of surface-level similarity rather than semantic accuracy or reasoning quality.

Disambiguation

Measures exact word matches, not semantic meaning or factual correctness.

Visual Metaphor

"A Stencil Overlay: checking how many words in the generated text perfectly align with the cutouts of a reference template."

Key Tools

SacreBLEUHugging Face EvaluateNLTKRAGASDeepEval

Related Connections

N-gram(Prerequisite)
Ground Truth(Component)
ROUGE Score(Alternative)
BERTScore(Semantic Alternative)

Conceptual Overview

Disambiguation

Measures exact word matches, not semantic meaning or factual correctness.

Visual Analog

A Stencil Overlay: checking how many words in the generated text perfectly align with the cutouts of a reference template.

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles