Okapi BM25

Okapi BM25

A ranking function used in Information Retrieval to estimate the relevance of documents to a given search query based on term frequency (TF) and inverse document frequency (IDF). In RAG pipelines, it serves as the primary algorithm for lexical or 'sparse' retrieval, utilizing saturation parameters to prevent high-frequency terms from overly dominating the relevance score.

Definition

Disambiguation

Lexical keyword matching based on exact tokens, distinct from semantic vector similarity.

Visual Metaphor

"A specialized sieve that catches specific, rare 'keyword' nuggets while allowing common 'stopword' sand to pass through, adjusted to ensure that larger buckets of text don't unfairly outweigh smaller ones."

Key Tools

ElasticsearchLuceneBM25SRank-BM25WeaviatePineconeLangChain

Related Connections

TF-IDF(Mathematical Predecessor)
Hybrid Search(Implementation Framework)
Sparse Retrieval(Methodology Category)
Reciprocal Rank Fusion (RRF)(Scoring Component)

Conceptual Overview

Disambiguation

Lexical keyword matching based on exact tokens, distinct from semantic vector similarity.

Visual Analog

A specialized sieve that catches specific, rare 'keyword' nuggets while allowing common 'stopword' sand to pass through, adjusted to ensure that larger buckets of text don't unfairly outweigh smaller ones.

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles