SmartFAQs.ai
Back to Learn
Intermediate

Horizontal Scaling

The architectural practice of distributing the workload of a RAG pipeline or AI agent system across multiple compute or storage nodes to handle increased concurrency and data volume. In this context, it involves sharding vector databases for faster retrieval and deploying multiple parallel instances of LLM orchestrators or agent workers to manage simultaneous user sessions.

Definition

The architectural practice of distributing the workload of a RAG pipeline or AI agent system across multiple compute or storage nodes to handle increased concurrency and data volume. In this context, it involves sharding vector databases for faster retrieval and deploying multiple parallel instances of LLM orchestrators or agent workers to manage simultaneous user sessions.

Disambiguation

Adding more 'instances' of workers or database nodes, rather than upgrading the CPU/GPU of a single server.

Visual Metaphor

"A supermarket opening ten checkout lanes to handle a holiday rush, rather than training one cashier to work ten times faster."

Key Tools
Kubernetes (K8s)RayPineconeWeaviateQdrantRedisCelery
Related Connections

Conceptual Overview

The architectural practice of distributing the workload of a RAG pipeline or AI agent system across multiple compute or storage nodes to handle increased concurrency and data volume. In this context, it involves sharding vector databases for faster retrieval and deploying multiple parallel instances of LLM orchestrators or agent workers to manage simultaneous user sessions.

Disambiguation

Adding more 'instances' of workers or database nodes, rather than upgrading the CPU/GPU of a single server.

Visual Analog

A supermarket opening ten checkout lanes to handle a holiday rush, rather than training one cashier to work ten times faster.

Related Articles