Definition
A retrieval structure in RAG pipelines that represents documents as high-dimensional vectors where most values are zero, mapping specific tokens to their frequency or importance scores (e.g., BM25). It excels at lexical keyword matching and finding rare terms, providing a high-precision counterweight to the semantic fluidity of dense embeddings.
In AI, this refers to lexical/token-based vector spaces, not a database index that only tracks a subset of records to save space.
"An alphabetical index at the back of a massive textbook that points you to the exact page where a specific technical term appears."
Conceptual Overview
A retrieval structure in RAG pipelines that represents documents as high-dimensional vectors where most values are zero, mapping specific tokens to their frequency or importance scores (e.g., BM25). It excels at lexical keyword matching and finding rare terms, providing a high-precision counterweight to the semantic fluidity of dense embeddings.
Disambiguation
In AI, this refers to lexical/token-based vector spaces, not a database index that only tracks a subset of records to save space.
Visual Analog
An alphabetical index at the back of a massive textbook that points you to the exact page where a specific technical term appears.