Definition
The maximum number of tokens an LLM can process in a single request, encompassing the prompt, retrieved document chunks in RAG, and the agent's conversation history. It serves as the physical memory boundary that determines how much external knowledge can be 'read' by the model at once before information is truncated or lost.
Distinguish from the total size of a vector database; this is the 'RAM' of the model, not the 'Hard Drive'.
"A workbench with a fixed surface area where every tool and document must fit simultaneously to be used."
- Tokens(Prerequisite)
- Lost-in-the-Middle Phenomenon(Constraint)
- Chunking Strategy(Component)
- Needle In A Haystack (NIAH)(Evaluation Metric)
Conceptual Overview
The maximum number of tokens an LLM can process in a single request, encompassing the prompt, retrieved document chunks in RAG, and the agent's conversation history. It serves as the physical memory boundary that determines how much external knowledge can be 'read' by the model at once before information is truncated or lost.
Disambiguation
Distinguish from the total size of a vector database; this is the 'RAM' of the model, not the 'Hard Drive'.
Visual Analog
A workbench with a fixed surface area where every tool and document must fit simultaneously to be used.