Definition
A retrieval architecture, exemplified by ColBERT, that delays the interaction between query and document representations by encoding both into multiple token-level embeddings and calculating similarity via a MaxSim operation at search time. This balances the expressive power of Cross-Encoders with the computational efficiency and pre-computability of Bi-Encoders.
Not to be confused with standard Bi-Encoders that compress entire documents into a single vector; Late Interaction maintains a vector for every token.
"Comparing two grocery lists by checking every item on List A against the entire List B to find the closest matches, rather than just comparing the total price of each list."
- MaxSim(Core Component)
- Bi-Encoder(Architectural Alternative)
- Cross-Encoder(Performance Benchmark)
- Multi-vector Indexing(Storage Requirement)
Conceptual Overview
A retrieval architecture, exemplified by ColBERT, that delays the interaction between query and document representations by encoding both into multiple token-level embeddings and calculating similarity via a MaxSim operation at search time. This balances the expressive power of Cross-Encoders with the computational efficiency and pre-computability of Bi-Encoders.
Disambiguation
Not to be confused with standard Bi-Encoders that compress entire documents into a single vector; Late Interaction maintains a vector for every token.
Visual Analog
Comparing two grocery lists by checking every item on List A against the entire List B to find the closest matches, rather than just comparing the total price of each list.