Definition
The process of increasing the computational resources—specifically GPU VRAM, RAM, and CPU cores—of a single node to accommodate high-dimensional vector embeddings, larger local LLMs, or memory-intensive indexing operations in a RAG pipeline.
Scaling 'Up' (beefier hardware) vs Scaling 'Out' (more machines).
"Upgrading a single library bookshelf to be taller and wider to hold more massive, complex encyclopedias, rather than adding more shelves to the room."
- Horizontal Scaling(Alternative architectural strategy)
- Vector Indexing(Component heavily reliant on memory capacity)
- VRAM Bottleneck(Primary constraint addressed by vertical scaling)
Conceptual Overview
The process of increasing the computational resources—specifically GPU VRAM, RAM, and CPU cores—of a single node to accommodate high-dimensional vector embeddings, larger local LLMs, or memory-intensive indexing operations in a RAG pipeline.
Disambiguation
Scaling 'Up' (beefier hardware) vs Scaling 'Out' (more machines).
Visual Analog
Upgrading a single library bookshelf to be taller and wider to hold more massive, complex encyclopedias, rather than adding more shelves to the room.