Definition
In AI agents and RAG pipelines, memory usage refers to the strategic management of conversation history, task states, and retrieved context to maintain temporal consistency across interactions. It involves balancing the retention of past information with the physical token constraints of the LLM's context window to prevent information loss or hallucinations.
Refers to stateful conversation history and token window management, not the physical RAM of the host server.
"A lawyer's active case file: essential documents are kept on the desk for immediate reference (Short-term), while older evidence is summarized or archived in a nearby filing cabinet for retrieval (Long-term)."
- Context Window(Prerequisite)
- Tokenization(Component)
- Sliding Window Buffer(Technique)
- Statefulness(Component)
Conceptual Overview
In AI agents and RAG pipelines, memory usage refers to the strategic management of conversation history, task states, and retrieved context to maintain temporal consistency across interactions. It involves balancing the retention of past information with the physical token constraints of the LLM's context window to prevent information loss or hallucinations.
Disambiguation
Refers to stateful conversation history and token window management, not the physical RAM of the host server.
Visual Analog
A lawyer's active case file: essential documents are kept on the desk for immediate reference (Short-term), while older evidence is summarized or archived in a nearby filing cabinet for retrieval (Long-term).