Definition
The deployment of redundant copies of vector database indices or model inference nodes across a distributed cluster to increase read throughput and provide high availability (HA). While improving performance for concurrent RAG queries, it introduces trade-offs regarding synchronization latency and increased infrastructure costs.
In RAG, this refers to scaling read capacity and fault tolerance, not data partitioning (sharding) or data duplication in a corpus.
"A popular library printing ten identical copies of the same reference book so ten researchers can look up facts simultaneously instead of waiting in line for one copy."
Conceptual Overview
The deployment of redundant copies of vector database indices or model inference nodes across a distributed cluster to increase read throughput and provide high availability (HA). While improving performance for concurrent RAG queries, it introduces trade-offs regarding synchronization latency and increased infrastructure costs.
Disambiguation
In RAG, this refers to scaling read capacity and fault tolerance, not data partitioning (sharding) or data duplication in a corpus.
Visual Analog
A popular library printing ten identical copies of the same reference book so ten researchers can look up facts simultaneously instead of waiting in line for one copy.