Definition
An architectural framework that optimizes Large Language Model outputs by retrieving relevant document snippets from external knowledge bases before the generation phase. It trades increased inference latency and retrieval-logic complexity for a significant reduction in hallucinations and the ability to access real-time or private data without retraining.
RAG is an 'open-book' process at inference time, whereas fine-tuning is 'studying' to update the model's internal weights.
"An open-book exam where a student (LLM) researches specific facts in a textbook (Vector Database) to answer a question instead of relying on memory."
Conceptual Overview
An architectural framework that optimizes Large Language Model outputs by retrieving relevant document snippets from external knowledge bases before the generation phase. It trades increased inference latency and retrieval-logic complexity for a significant reduction in hallucinations and the ability to access real-time or private data without retraining.
Disambiguation
RAG is an 'open-book' process at inference time, whereas fine-tuning is 'studying' to update the model's internal weights.
Visual Analog
An open-book exam where a student (LLM) researches specific facts in a textbook (Vector Database) to answer a question instead of relying on memory.