Self-Attention

Self-Attention

A mechanism within Transformer architectures that computes a weighted representation of an input sequence by calculating the relevance of every token to every other token. In RAG and Agentic workflows, it enables the model to resolve coreferences and capture nuanced dependencies across long-range context, though it incurs a quadratic computational cost (O(n²)) relative to sequence length.

Definition

Disambiguation

Distinguish from Cross-Attention, which relates tokens between two different sequences; Self-Attention only relates tokens within the same sequence.

Visual Metaphor

"A high-powered searchlight that, for every word in a sentence, simultaneously illuminates all other words with varying intensities based on their relevance to the current word's meaning."

Conceptual Overview

Disambiguation

Distinguish from Cross-Attention, which relates tokens between two different sequences; Self-Attention only relates tokens within the same sequence.

Visual Analog

A high-powered searchlight that, for every word in a sentence, simultaneously illuminates all other words with varying intensities based on their relevance to the current word's meaning.

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles