Definition
A neural network architecture used in RAG reranking where a query and a document are fed into the model simultaneously to compute a high-precision relevance score. Unlike Bi-Encoders, it allows for full self-attention across both inputs, resulting in superior accuracy at the cost of high computational latency and the inability to pre-compute embeddings.
Used for re-scoring a small candidate set, not for initial vector database retrieval.
"A magnifying glass examining a key and a lock side-by-side to ensure every groove matches perfectly, rather than just checking their general shape."
Conceptual Overview
A neural network architecture used in RAG reranking where a query and a document are fed into the model simultaneously to compute a high-precision relevance score. Unlike Bi-Encoders, it allows for full self-attention across both inputs, resulting in superior accuracy at the cost of high computational latency and the inability to pre-compute embeddings.
Disambiguation
Used for re-scoring a small candidate set, not for initial vector database retrieval.
Visual Analog
A magnifying glass examining a key and a lock side-by-side to ensure every groove matches perfectly, rather than just checking their general shape.