Definition
In RAG and agentic workflows, bandwidth refers to the system's processing throughput capacity, specifically the rate of token generation (tokens per second) and the volume of concurrent vector search queries or embedding tasks. High bandwidth is critical for scaling multi-agent systems where numerous sub-tasks and retrieval steps must execute in parallel without bottlenecking the inference engine.
Refers to LLM token throughput or vector DB query capacity, not network ISP speed.
"A multi-lane highway toll plaza where the number of open booths determines how many cars (data requests) can pass through at once, regardless of how fast each individual car is driving."
Conceptual Overview
In RAG and agentic workflows, bandwidth refers to the system's processing throughput capacity, specifically the rate of token generation (tokens per second) and the volume of concurrent vector search queries or embedding tasks. High bandwidth is critical for scaling multi-agent systems where numerous sub-tasks and retrieval steps must execute in parallel without bottlenecking the inference engine.
Disambiguation
Refers to LLM token throughput or vector DB query capacity, not network ISP speed.
Visual Analog
A multi-lane highway toll plaza where the number of open booths determines how many cars (data requests) can pass through at once, regardless of how fast each individual car is driving.