Definition
A specialized data structure optimized for storing and querying high-dimensional vector embeddings using similarity metrics like Cosine or Euclidean distance. In RAG pipelines, it enables efficient Approximate Nearest Neighbor (ANN) search, allowing the system to trade off a small degree of accuracy for massive gains in retrieval speed and scalability.
Distinguished from relational B-tree indexes by its focus on semantic proximity rather than exact keyword matches.
"A 3D star map where similar topics are clustered into constellations, allowing a searcher to jump to a specific sector rather than checking every star individually."
- Vector Embedding(Prerequisite)
- Approximate Nearest Neighbor (ANN)(Component)
- HNSW (Hierarchical Navigable Small World)(Component)
- Semantic Search(Component)
Conceptual Overview
A specialized data structure optimized for storing and querying high-dimensional vector embeddings using similarity metrics like Cosine or Euclidean distance. In RAG pipelines, it enables efficient Approximate Nearest Neighbor (ANN) search, allowing the system to trade off a small degree of accuracy for massive gains in retrieval speed and scalability.
Disambiguation
Distinguished from relational B-tree indexes by its focus on semantic proximity rather than exact keyword matches.
Visual Analog
A 3D star map where similar topics are clustered into constellations, allowing a searcher to jump to a specific sector rather than checking every star individually.