SmartFAQs.ai
Back to Learn
Intermediate

Okapi BM25

A ranking function used in Information Retrieval to estimate the relevance of documents to a given search query based on term frequency (TF) and inverse document frequency (IDF). In RAG pipelines, it serves as the primary algorithm for lexical or 'sparse' retrieval, utilizing saturation parameters to prevent high-frequency terms from overly dominating the relevance score.

Definition

A ranking function used in Information Retrieval to estimate the relevance of documents to a given search query based on term frequency (TF) and inverse document frequency (IDF). In RAG pipelines, it serves as the primary algorithm for lexical or 'sparse' retrieval, utilizing saturation parameters to prevent high-frequency terms from overly dominating the relevance score.

Disambiguation

Lexical keyword matching based on exact tokens, distinct from semantic vector similarity.

Visual Metaphor

"A specialized sieve that catches specific, rare 'keyword' nuggets while allowing common 'stopword' sand to pass through, adjusted to ensure that larger buckets of text don't unfairly outweigh smaller ones."

Key Tools
ElasticsearchLuceneBM25SRank-BM25WeaviatePineconeLangChain
Related Connections

Conceptual Overview

A ranking function used in Information Retrieval to estimate the relevance of documents to a given search query based on term frequency (TF) and inverse document frequency (IDF). In RAG pipelines, it serves as the primary algorithm for lexical or 'sparse' retrieval, utilizing saturation parameters to prevent high-frequency terms from overly dominating the relevance score.

Disambiguation

Lexical keyword matching based on exact tokens, distinct from semantic vector similarity.

Visual Analog

A specialized sieve that catches specific, rare 'keyword' nuggets while allowing common 'stopword' sand to pass through, adjusted to ensure that larger buckets of text don't unfairly outweigh smaller ones.

Related Articles