Late Interaction

A retrieval architecture, exemplified by ColBERT, that delays the interaction between query and document representations by encoding both into multiple token-level embeddings and calculating similarity via a MaxSim operation at search time. This balances the expressive power of Cross-Encoders with the computational efficiency and pre-computability of Bi-Encoders.

Definition

Disambiguation

Not to be confused with standard Bi-Encoders that compress entire documents into a single vector; Late Interaction maintains a vector for every token.

Visual Metaphor

"Comparing two grocery lists by checking every item on List A against the entire List B to find the closest matches, rather than just comparing the total price of each list."

Key Tools

ColBERTRAGatouillePLAIDVespaPinecone (via multi-vector support)

Related Connections

MaxSim(Core Component)
Bi-Encoder(Architectural Alternative)
Cross-Encoder(Performance Benchmark)
Multi-vector Indexing(Storage Requirement)

Conceptual Overview

Disambiguation

Not to be confused with standard Bi-Encoders that compress entire documents into a single vector; Late Interaction maintains a vector for every token.

Visual Analog

Comparing two grocery lists by checking every item on List A against the entire List B to find the closest matches, rather than just comparing the total price of each list.

Late Interaction

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles