Definition
A statistical weighting mechanism used in RAG pipelines for lexical retrieval that quantifies a term's importance by multiplying its local frequency in a document by its global rarity across the corpus. In modern AI agents, it serves as the foundation for sparse vector representation to ensure exact keyword matching that dense embeddings might overlook.
Lexical keyword weighting, not semantic vector similarity.
"A highlighter that glows brightest on unique keywords found in a single page but fades to invisible for common words like 'the' or 'and' found everywhere."
Conceptual Overview
A statistical weighting mechanism used in RAG pipelines for lexical retrieval that quantifies a term's importance by multiplying its local frequency in a document by its global rarity across the corpus. In modern AI agents, it serves as the foundation for sparse vector representation to ensure exact keyword matching that dense embeddings might overlook.
Disambiguation
Lexical keyword weighting, not semantic vector similarity.
Visual Analog
A highlighter that glows brightest on unique keywords found in a single page but fades to invisible for common words like 'the' or 'and' found everywhere.