Definition
A statistical weighting mechanism used in RAG pipelines for lexical retrieval that quantifies a term's importance by multiplying its local frequency in a document by its global rarity across the corpus. In modern AI agents, it serves as the foundation for sparse vector representation to ensure exact keyword matching that dense embeddings might overlook.
Lexical keyword weighting, not semantic vector similarity.
"A highlighter that glows brightest on unique keywords found in a single page but fades to invisible for common words like 'the' or 'and' found everywhere."
- BM25(Successor/Optimized version)
- Sparse Vector(Data Representation)
- Hybrid Search(Integration Strategy)
- Cosine Similarity(Mathematical Comparison Metric)
Conceptual Overview
A statistical weighting mechanism used in RAG pipelines for lexical retrieval that quantifies a term's importance by multiplying its local frequency in a document by its global rarity across the corpus. In modern AI agents, it serves as the foundation for sparse vector representation to ensure exact keyword matching that dense embeddings might overlook.
Disambiguation
Lexical keyword weighting, not semantic vector similarity.
Visual Analog
A highlighter that glows brightest on unique keywords found in a single page but fades to invisible for common words like 'the' or 'and' found everywhere.