Horizontal Scaling

The architectural practice of distributing the workload of a RAG pipeline or AI agent system across multiple compute or storage nodes to handle increased concurrency and data volume. In this context, it involves sharding vector databases for faster retrieval and deploying multiple parallel instances of LLM orchestrators or agent workers to manage simultaneous user sessions.

Definition

Disambiguation

Adding more 'instances' of workers or database nodes, rather than upgrading the CPU/GPU of a single server.

Visual Metaphor

"A supermarket opening ten checkout lanes to handle a holiday rush, rather than training one cashier to work ten times faster."

Key Tools

Kubernetes (K8s)RayPineconeWeaviateQdrantRedisCelery

Related Connections

Sharding(Component)
Load Balancing(Prerequisite)
Vertical Scaling(Contrast)
Statelessness(Design Constraint)

Conceptual Overview

Disambiguation

Adding more 'instances' of workers or database nodes, rather than upgrading the CPU/GPU of a single server.

Visual Analog

A supermarket opening ten checkout lanes to handle a holiday rush, rather than training one cashier to work ten times faster.

Horizontal Scaling

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles