Definition
In RAG pipelines, Inverse Document Frequency (IDF) is a statistical weight used in lexical retrieval to penalize high-frequency 'stop words' and boost rare, highly-informative terms, ensuring the retriever prioritizes documents containing unique query keywords. It serves as a core component of the BM25 algorithm, often used in hybrid search to compensate for the semantic 'fuzziness' of vector embeddings.
Measures global rarity across the corpus, not local frequency within a single document.
"A volume knob that turns down the background noise of common words so that rare, distinctive signals can be heard."
- BM25(Successor/Implementation)
- Hybrid Search(Orchestration Strategy)
- TF-IDF(Prerequisite)
- Sparse Embeddings(Vector Representation)
Conceptual Overview
In RAG pipelines, Inverse Document Frequency (IDF) is a statistical weight used in lexical retrieval to penalize high-frequency 'stop words' and boost rare, highly-informative terms, ensuring the retriever prioritizes documents containing unique query keywords. It serves as a core component of the BM25 algorithm, often used in hybrid search to compensate for the semantic 'fuzziness' of vector embeddings.
Disambiguation
Measures global rarity across the corpus, not local frequency within a single document.
Visual Analog
A volume knob that turns down the background noise of common words so that rare, distinctive signals can be heard.