Definition
In RAG and agentic workflows, bandwidth refers to the system's processing throughput capacity, specifically the rate of token generation (tokens per second) and the volume of concurrent vector search queries or embedding tasks. High bandwidth is critical for scaling multi-agent systems where numerous sub-tasks and retrieval steps must execute in parallel without bottlenecking the inference engine.
Refers to LLM token throughput or vector DB query capacity, not network ISP speed.
"A multi-lane highway toll plaza where the number of open booths determines how many cars (data requests) can pass through at once, regardless of how fast each individual car is driving."
- Throughput(Performance Metric)
- Latency(Trade-off (Increasing bandwidth often involves batching, which can increase individual request latency))
- Rate Limiting(Operational Constraint)
- Parallelism(Implementation Strategy)
Conceptual Overview
In RAG and agentic workflows, bandwidth refers to the system's processing throughput capacity, specifically the rate of token generation (tokens per second) and the volume of concurrent vector search queries or embedding tasks. High bandwidth is critical for scaling multi-agent systems where numerous sub-tasks and retrieval steps must execute in parallel without bottlenecking the inference engine.
Disambiguation
Refers to LLM token throughput or vector DB query capacity, not network ISP speed.
Visual Analog
A multi-lane highway toll plaza where the number of open booths determines how many cars (data requests) can pass through at once, regardless of how fast each individual car is driving.