Definition
A ranking function used in information retrieval to estimate document relevance based on term frequency and saturation; in RAG, it serves as the core algorithm for sparse retrieval. While it excels at exact keyword matching and handling rare terms, it lacks semantic understanding, often requiring a hybrid search trade-off where it is paired with dense vector embeddings to balance precision and context.
Keyword-based frequency ranking, not vector-based semantic similarity.
"A highlighter-based indexer that ranks documents higher based on how many times a specific 'rare' word is colored in, adjusted for the document's total length."
- TF-IDF(Prerequisite)
- Sparse Retrieval(Component)
- Hybrid Search(Implementation Context)
- Reciprocal Rank Fusion (RRF)(Optimization Strategy)
Conceptual Overview
A ranking function used in information retrieval to estimate document relevance based on term frequency and saturation; in RAG, it serves as the core algorithm for sparse retrieval. While it excels at exact keyword matching and handling rare terms, it lacks semantic understanding, often requiring a hybrid search trade-off where it is paired with dense vector embeddings to balance precision and context.
Disambiguation
Keyword-based frequency ranking, not vector-based semantic similarity.
Visual Analog
A highlighter-based indexer that ranks documents higher based on how many times a specific 'rare' word is colored in, adjusted for the document's total length.