SmartFAQs.ai
Back to Learn
Intermediate

BM25

A ranking function used in information retrieval to estimate document relevance based on term frequency and saturation; in RAG, it serves as the core algorithm for sparse retrieval. While it excels at exact keyword matching and handling rare terms, it lacks semantic understanding, often requiring a hybrid search trade-off where it is paired with dense vector embeddings to balance precision and context.

Definition

A ranking function used in information retrieval to estimate document relevance based on term frequency and saturation; in RAG, it serves as the core algorithm for sparse retrieval. While it excels at exact keyword matching and handling rare terms, it lacks semantic understanding, often requiring a hybrid search trade-off where it is paired with dense vector embeddings to balance precision and context.

Disambiguation

Keyword-based frequency ranking, not vector-based semantic similarity.

Visual Metaphor

"A highlighter-based indexer that ranks documents higher based on how many times a specific 'rare' word is colored in, adjusted for the document's total length."

Key Tools
ElasticsearchApache LuceneRank-BM25 (Python)Pinecone (Hybrid Search)WeaviateLangChain
Related Connections

Conceptual Overview

A ranking function used in information retrieval to estimate document relevance based on term frequency and saturation; in RAG, it serves as the core algorithm for sparse retrieval. While it excels at exact keyword matching and handling rare terms, it lacks semantic understanding, often requiring a hybrid search trade-off where it is paired with dense vector embeddings to balance precision and context.

Disambiguation

Keyword-based frequency ranking, not vector-based semantic similarity.

Visual Analog

A highlighter-based indexer that ranks documents higher based on how many times a specific 'rare' word is colored in, adjusted for the document's total length.

Related Articles