Definition
A retrieval structure in RAG pipelines that represents documents as high-dimensional vectors where most values are zero, mapping specific tokens to their frequency or importance scores (e.g., BM25). It excels at lexical keyword matching and finding rare terms, providing a high-precision counterweight to the semantic fluidity of dense embeddings.
In AI, this refers to lexical/token-based vector spaces, not a database index that only tracks a subset of records to save space.
"An alphabetical index at the back of a massive textbook that points you to the exact page where a specific technical term appears."
- BM25(Underlying Algorithm)
- Dense Index(Counterpart/Alternative)
- Hybrid Search(Implementation Pattern)
- Inverted Index(Prerequisite Architecture)
Conceptual Overview
A retrieval structure in RAG pipelines that represents documents as high-dimensional vectors where most values are zero, mapping specific tokens to their frequency or importance scores (e.g., BM25). It excels at lexical keyword matching and finding rare terms, providing a high-precision counterweight to the semantic fluidity of dense embeddings.
Disambiguation
In AI, this refers to lexical/token-based vector spaces, not a database index that only tracks a subset of records to save space.
Visual Analog
An alphabetical index at the back of a massive textbook that points you to the exact page where a specific technical term appears.