Back to Learn
Deep Dive

Self-Attention

A mechanism within Transformer architectures that computes a weighted representation of an input sequence by calculating the relevance of every token to every other token. In RAG and Agentic workflows, it enables the model to resolve coreferences and capture nuanced dependencies across long-range context, though it incurs a quadratic computational cost (O(n²)) relative to sequence length.

Definition

A mechanism within Transformer architectures that computes a weighted representation of an input sequence by calculating the relevance of every token to every other token. In RAG and Agentic workflows, it enables the model to resolve coreferences and capture nuanced dependencies across long-range context, though it incurs a quadratic computational cost (O(n²)) relative to sequence length.

Disambiguation

Distinguish from Cross-Attention, which relates tokens between two different sequences; Self-Attention only relates tokens within the same sequence.

Visual Metaphor

"A high-powered searchlight that, for every word in a sentence, simultaneously illuminates all other words with varying intensities based on their relevance to the current word's meaning."

Conceptual Overview

A mechanism within Transformer architectures that computes a weighted representation of an input sequence by calculating the relevance of every token to every other token. In RAG and Agentic workflows, it enables the model to resolve coreferences and capture nuanced dependencies across long-range context, though it incurs a quadratic computational cost (O(n²)) relative to sequence length.

Disambiguation

Distinguish from Cross-Attention, which relates tokens between two different sequences; Self-Attention only relates tokens within the same sequence.

Visual Analog

A high-powered searchlight that, for every word in a sentence, simultaneously illuminates all other words with varying intensities based on their relevance to the current word's meaning.

Related Articles