Definition
Input tokens are the quantized units of text—ranging from characters to sub-words—that are processed by an LLM's transformer layers to generate attention weights. In RAG pipelines, managing input tokens involves a direct trade-off between providing more retrieved context for accuracy and minimizing latency/API costs as the prompt size approaches the model's context window limit.
Refers to the tokenized representation of the prompt and retrieved documents, not the raw character count or the resulting generated text.
"Individual Scrabble tiles being fed into a conveyor belt for a machine to read and analyze."
- Context Window(Constraint)
- Tokenizer(Prerequisite)
- Prompt Engineering(Optimization Method)
- Output Tokens(Complementary)
Conceptual Overview
Input tokens are the quantized units of text—ranging from characters to sub-words—that are processed by an LLM's transformer layers to generate attention weights. In RAG pipelines, managing input tokens involves a direct trade-off between providing more retrieved context for accuracy and minimizing latency/API costs as the prompt size approaches the model's context window limit.
Disambiguation
Refers to the tokenized representation of the prompt and retrieved documents, not the raw character count or the resulting generated text.
Visual Analog
Individual Scrabble tiles being fed into a conveyor belt for a machine to read and analyze.