Definition
Retrieval-Augmented Generation (RAG) is an architectural framework that optimizes Large Language Model (LLM) output by retrieving relevant information from an authoritative, external knowledge base before generating a response. This pattern balances the trade-off between the high cost of model fine-tuning and the hallucination risks of zero-shot inference, though it introduces additional system latency and orchestration complexity.
Dynamic context retrieval vs. static model weight modification (fine-tuning).
"An open-book exam where the student (LLM) uses a searchable library (Vector Database) to find specific facts before writing an answer."
- Vector Database(Component)
- Embeddings(Prerequisite)
- Semantic Search(Component)
- Context Window(Constraint)
Conceptual Overview
Retrieval-Augmented Generation (RAG) is an architectural framework that optimizes Large Language Model (LLM) output by retrieving relevant information from an authoritative, external knowledge base before generating a response. This pattern balances the trade-off between the high cost of model fine-tuning and the hallucination risks of zero-shot inference, though it introduces additional system latency and orchestration complexity.
Disambiguation
Dynamic context retrieval vs. static model weight modification (fine-tuning).
Visual Analog
An open-book exam where the student (LLM) uses a searchable library (Vector Database) to find specific facts before writing an answer.