Back to Learn
Concept

Retrieval-Augmented Generation (RAG)

An architectural framework that optimizes Large Language Model outputs by retrieving relevant document snippets from external knowledge bases before the generation phase. It trades increased inference latency and retrieval-logic complexity for a significant reduction in hallucinations and the ability to access real-time or private data without retraining.

Definition

An architectural framework that optimizes Large Language Model outputs by retrieving relevant document snippets from external knowledge bases before the generation phase. It trades increased inference latency and retrieval-logic complexity for a significant reduction in hallucinations and the ability to access real-time or private data without retraining.

Disambiguation

RAG is an 'open-book' process at inference time, whereas fine-tuning is 'studying' to update the model's internal weights.

Visual Metaphor

"An open-book exam where a student (LLM) researches specific facts in a textbook (Vector Database) to answer a question instead of relying on memory."

Conceptual Overview

An architectural framework that optimizes Large Language Model outputs by retrieving relevant document snippets from external knowledge bases before the generation phase. It trades increased inference latency and retrieval-logic complexity for a significant reduction in hallucinations and the ability to access real-time or private data without retraining.

Disambiguation

RAG is an 'open-book' process at inference time, whereas fine-tuning is 'studying' to update the model's internal weights.

Visual Analog

An open-book exam where a student (LLM) researches specific facts in a textbook (Vector Database) to answer a question instead of relying on memory.

Related Articles