Definition
The central reasoning engine of a RAG pipeline or AI Agent responsible for synthesizing retrieved context into natural language or orchestrating tool-calling logic. It functions as a non-deterministic processor that maps input prompts and external data to coherent, task-oriented outputs.
Distinguish between the LLM as the 'reasoning brain' and the Vector Database as the 'long-term memory'.
"A polyglot judge who reviews a stack of evidence (retrieved documents) to deliver a final verdict (the response)."
- Context Window(Constraint: Limits the volume of RAG-retrieved data the LLM can ingest per request.)
- Temperature(Parameter: Controls the randomness of the LLM's synthesis of retrieved facts.)
- Inference Latency(Trade-off: Larger models provide better reasoning but increase the wait time for the end-user.)
- Tool Calling(Component: The mechanism by which an LLM acts as an Agent to interact with external APIs.)
Conceptual Overview
The central reasoning engine of a RAG pipeline or AI Agent responsible for synthesizing retrieved context into natural language or orchestrating tool-calling logic. It functions as a non-deterministic processor that maps input prompts and external data to coherent, task-oriented outputs.
Disambiguation
Distinguish between the LLM as the 'reasoning brain' and the Vector Database as the 'long-term memory'.
Visual Analog
A polyglot judge who reviews a stack of evidence (retrieved documents) to deliver a final verdict (the response).