TLDR
Agent Design Patterns represent the architectural shift from Large Language Models (LLMs) as static text generators to autonomous agents capable of reasoning, tool use, and self-correction. At the core of this evolution is the ReAct (Reason + Act) framework, which empowers a "Brain" (LLM) to interact with external environments via APIs, databases, and specialized retrieval pipelines. Whether through Single-Agent simplicity or Multi-Agent coordination, these patterns solve the limitations of "Naive RAG" by introducing iterative loops, expert routing, and self-reflective evaluation.
Conceptual Overview
The transition from a "Chatbot" to an "Agent" is defined by the introduction of an orchestration layer that manages the model's interaction with the world. In this systems view, an agent is not just a model; it is a composite entity consisting of four pillars: the Reasoning Engine, Planning Modules, Memory Systems, and Tool Interfaces.
The Atomic Unit: Single-Agent Patterns
The foundational pattern is the Single-Agent, often implemented as a Plain Tool Agent. This architecture utilizes a Single-Router Architecture where the LLM acts as a central dispatcher. When a query is received, the agent does not immediately answer; it enters a reasoning loop. It analyzes the "Toolbox" (defined via JSON schemas), selects the appropriate tool (e.g., a web search or a SQL executor), and observes the output before synthesizing a final response.
The Evolution of Retrieval: From Naive to Agentic RAG
Traditional Retrieval-Augmented Generation (RAG) is a linear, "one-shot" process: Query → Retrieve → Generate. Agent Design Patterns transform this into Agentic RAG, where retrieval is treated as a tool-use problem.
- Naive RAG: Suffers from "lost in the middle" and low precision.
- Advanced RAG: Introduces pre-retrieval (query expansion) and post-retrieval (re-ranking) to improve context quality.
- Agentic RAG: Employs a reasoning loop to decide if retrieval is necessary, which source to use, and how to validate the results.
The Systems View: Coordination and Specialization
As tasks grow in complexity, the monolithic agent becomes a bottleneck. Multi-Agent RAG and Expert-Routed RAG introduce a "Divide and Conquer" strategy. In these patterns, a Master Orchestrator (or Router) decomposes a complex request into sub-tasks, delegating them to specialized agents (e.g., a "Legal Expert" vs. a "Financial Analyst"). This reduces the cognitive load on any single model and allows for parallel execution.
The diagram illustrates the flow from User Input to a Gating Router, which branches into specialized Retrieval (Graph, Vector, SQL), passes through a Reasoning Loop (Self-RAG), and is finally validated by an LLM-as-a-Judge before reaching the user.
Practical Implementations
1. Multi-Stage Retrieval Pipelines
To solve the "needle in a haystack" problem, developers implement Multi-Stage Retrieval. This pattern balances speed and precision:
- Stage 1 (Coarse): Uses a Bi-encoder for high-recall, low-latency retrieval across millions of documents.
- Stage 2 (Fine): Uses a Cross-encoder (Reranker) to perform deep semantic analysis on the top-k results, ensuring only the most relevant context enters the LLM's window.
2. Structured Data Interfaces (NL2SQL)
For enterprise data, agents use NL2SQL (Natural Language to SQL) patterns. Unlike simple scripts, an NL2SQL agent performs Schema Linking to map ambiguous human terms to specific database columns. It operates in an iterative loop: if a generated SQL query fails, the agent captures the error, reasons about the fix, and retries—a process known as self-correction.
3. Graph-RAG and Traversal
When relationships between entities are as important as the entities themselves, Graph-RAG is the preferred pattern. By combining Symbolic AI (Knowledge Graphs) with Neural AI (LLMs), agents can perform Graph Traversal (DFS/BFS) to answer multi-hop questions that flat vector databases cannot resolve (e.g., "How does the CEO of the company that acquired X feel about Y?").
4. Physical-World Integration: Smart Rehydration
Agent patterns extend beyond digital text. In Smart Rehydration systems, the "agent" is a closed-loop feedback mechanism. It ingests biometric telemetry (sweat microfluidics, BIA), reasons about the user's physiological state, and triggers "actions" (IoT alerts or fluid intake recommendations). This represents the frontier of Agentic IoT, where the reasoning engine manages physical health outcomes.
Advanced Techniques
Self-Reflection and Adaptive Retrieval
Self-RAG introduces specialized reflection tokens (e.g., [Retrieve], [IsREL]) that allow the model to critique its own performance in real-time. Similarly, Adaptive Retrieval analyzes query complexity to bypass retrieval for simple facts (saving tokens) while deploying multi-hop decomposition for complex analytical queries.
Context Construction and Semantic Density
Effective agents do not just dump retrieved data into a prompt. Context Construction is the strategic engineering of the input to maximize Semantic Density. This involves filtering boilerplate, ranking information by salience, and ensuring critical facts are not "lost in the middle" of the context window.
Expert Routing and Gating
In Expert-Routed RAG, a Gating Network performs a high-speed triage of the incoming query. It determines whether the answer lies in the model's parametric knowledge (internal weights) or requires non-parametric retrieval (external data). This prevents "knowledge contamination" where irrelevant retrieved documents might distract the model from a simple reasoning task.
Research and Future Directions
Telemetry-driven Improvement (TDI)
The future of agent design lies in Observability. Telemetry-driven Improvement (TDI) applies the scientific method to agent performance. By collecting MELT (Metrics, Events, Logs, Traces), developers can identify where an agent's reasoning loop breaks. A critical part of this optimization is A (Comparing prompt variants), which allows architects to empirically determine which instructions yield the highest accuracy across thousands of runs.
Human-in-the-Loop: Grader-based Systems
As agents take on high-stakes roles (e.g., legal grading or medical triage), Grader-in-the-loop (GITL) frameworks become essential. These systems use an LLM-as-a-judge to provide initial assessments, which are then calibrated by human experts. This hybrid approach ensures transparency and pedagogical rigor while maintaining the scalability of AI.
The Convergence of Symbolic and Neural Reasoning
We are moving toward a "System 2" thinking for AI, where Decomposition RAG and Reasoning Graphs allow agents to "think before they speak." By breaking complex goals into atomic sub-questions and traversing decision trees, agents are evolving from simple responders into sophisticated problem-solvers.
Frequently Asked Questions
Q: How does Multi-Agent RAG differ from standard Agentic RAG?
Standard Agentic RAG typically involves a single LLM managing its own retrieval tools. Multi-Agent RAG introduces a hierarchy where a "Planner" agent delegates sub-tasks to specialized "Worker" agents (e.g., a SQL agent and a Vector Search agent). This distribution of cognitive load is superior for multi-faceted queries that require synthesizing data from heterogeneous sources.
Q: When should I use Graph-RAG instead of Vector-based RAG?
Use Graph-RAG when your data is highly interconnected and your queries require multi-hop reasoning (e.g., "Find all vendors associated with projects managed by employees in the London office"). Vector-based RAG is better for "vibe-based" semantic similarity or finding specific text chunks in unstructured documents.
Q: What is the "Semantic Gap" in NL2SQL agents?
The Semantic Gap is the distance between ambiguous human language ("top customers") and rigid SQL syntax (SELECT...ORDER BY revenue DESC). Agents bridge this gap through Schema Linking and iterative error correction, where the agent "explores" the database schema to find the correct mappings.
Q: Why is "Context Construction" considered a separate step from "Retrieval"?
Retrieval finds the raw data, but Context Construction curates it. Raw chunks are often redundant or noisy. Construction involves re-ranking, deduplication, and formatting (e.g., into Markdown tables) to ensure the LLM can reason effectively within its token budget and avoid the "lost in the middle" effect.
Q: How does "A" (Comparing prompt variants) impact agent production cycles?
A is the core of the TDI (Telemetry-driven Improvement) loop. Because agents are non-deterministic, small changes in the "System Prompt" can lead to vastly different tool-use behaviors. By systematically A (Comparing prompt variants), developers can move from "vibe-based" engineering to data-driven optimization, ensuring the agent remains robust across edge cases.