SmartFAQs.ai
Back to Learn
Deep Dive

Late Interaction

A retrieval architecture, exemplified by ColBERT, that delays the interaction between query and document representations by encoding both into multiple token-level embeddings and calculating similarity via a MaxSim operation at search time. This balances the expressive power of Cross-Encoders with the computational efficiency and pre-computability of Bi-Encoders.

Definition

A retrieval architecture, exemplified by ColBERT, that delays the interaction between query and document representations by encoding both into multiple token-level embeddings and calculating similarity via a MaxSim operation at search time. This balances the expressive power of Cross-Encoders with the computational efficiency and pre-computability of Bi-Encoders.

Disambiguation

Not to be confused with standard Bi-Encoders that compress entire documents into a single vector; Late Interaction maintains a vector for every token.

Visual Metaphor

"Comparing two grocery lists by checking every item on List A against the entire List B to find the closest matches, rather than just comparing the total price of each list."

Key Tools
ColBERTRAGatouillePLAIDVespaPinecone (via multi-vector support)
Related Connections

Conceptual Overview

A retrieval architecture, exemplified by ColBERT, that delays the interaction between query and document representations by encoding both into multiple token-level embeddings and calculating similarity via a MaxSim operation at search time. This balances the expressive power of Cross-Encoders with the computational efficiency and pre-computability of Bi-Encoders.

Disambiguation

Not to be confused with standard Bi-Encoders that compress entire documents into a single vector; Late Interaction maintains a vector for every token.

Visual Analog

Comparing two grocery lists by checking every item on List A against the entire List B to find the closest matches, rather than just comparing the total price of each list.

Related Articles