SmartFAQs.ai
Back to Learn
Intermediate

Vertical Scaling

The process of increasing the computational resources—specifically GPU VRAM, RAM, and CPU cores—of a single node to accommodate high-dimensional vector embeddings, larger local LLMs, or memory-intensive indexing operations in a RAG pipeline.

Definition

The process of increasing the computational resources—specifically GPU VRAM, RAM, and CPU cores—of a single node to accommodate high-dimensional vector embeddings, larger local LLMs, or memory-intensive indexing operations in a RAG pipeline.

Disambiguation

Scaling 'Up' (beefier hardware) vs Scaling 'Out' (more machines).

Visual Metaphor

"Upgrading a single library bookshelf to be taller and wider to hold more massive, complex encyclopedias, rather than adding more shelves to the room."

Key Tools
NVIDIA CUDAvLLMPinecone (s1/p1 pod types)Docker (Resource Limits)Kubernetes (Vertical Pod Autoscaler)
Related Connections

Conceptual Overview

The process of increasing the computational resources—specifically GPU VRAM, RAM, and CPU cores—of a single node to accommodate high-dimensional vector embeddings, larger local LLMs, or memory-intensive indexing operations in a RAG pipeline.

Disambiguation

Scaling 'Up' (beefier hardware) vs Scaling 'Out' (more machines).

Visual Analog

Upgrading a single library bookshelf to be taller and wider to hold more massive, complex encyclopedias, rather than adding more shelves to the room.

Related Articles