Definition
A retrieval framework utilizing dual-encoder architectures to represent queries and passages as high-dimensional dense vectors, enabling semantic retrieval via inner product similarity. While it excels at capturing latent context that keyword-based methods miss, it requires significant GPU compute for indexing and may struggle with domain-specific jargon compared to sparse retrieval.
Semantic vector search based on meaning, not lexical keyword matching like BM25.
"A multi-dimensional star map where queries and documents are coordinates; the retrieval process is finding the closest neighboring stars based on distance rather than matching their names."
- Bi-Encoder(Component)
- Vector Embeddings(Prerequisite)
- Approximate Nearest Neighbor (ANN)(Component)
- BM25(Contrast)
Conceptual Overview
A retrieval framework utilizing dual-encoder architectures to represent queries and passages as high-dimensional dense vectors, enabling semantic retrieval via inner product similarity. While it excels at capturing latent context that keyword-based methods miss, it requires significant GPU compute for indexing and may struggle with domain-specific jargon compared to sparse retrieval.
Disambiguation
Semantic vector search based on meaning, not lexical keyword matching like BM25.
Visual Analog
A multi-dimensional star map where queries and documents are coordinates; the retrieval process is finding the closest neighboring stars based on distance rather than matching their names.