Memory Budget

Memory Budget

The strategic allocation of token limits or hardware resources (VRAM/RAM) within an LLM or Agent architecture to manage context window utilization and retrieval density. It forces a trade-off between comprehensive context—improving recall but increasing latency and cost—and performance efficiency, which reduces cost but risks context loss.

Definition

Disambiguation

In AI, this refers to token quotas and context window management rather than just general-purpose system RAM.

Visual Metaphor

"A Suitcase with Fixed Dividers: Deciding exactly how much space is reserved for 'essential clothes' (System Prompt) versus 'souvenirs' (Retrieved Documents) before the lid won't close."

Key Tools

LangChain (ConversationSummaryBufferMemory)MemGPTLlamaIndexRedisvLLM

Related Connections

Context Window(Prerequisite)
Sliding Window Memory(Component)
Tokenization(Component)
Vector Quantization(Component)

Conceptual Overview

Disambiguation

In AI, this refers to token quotas and context window management rather than just general-purpose system RAM.

Visual Analog

A Suitcase with Fixed Dividers: Deciding exactly how much space is reserved for 'essential clothes' (System Prompt) versus 'souvenirs' (Retrieved Documents) before the lid won't close.

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles