Definition
RAG with Memory is an architectural pattern that integrates conversational history into the retrieval-augmented generation loop, typically by using a 'Condense Question' step to rewrite follow-up queries into standalone search terms. While it enables coherent multi-turn reasoning, it introduces trade-offs such as increased latency from recursive LLM calls and higher token costs due to expanding context windows.
Distinguishes session-based conversational state from the static long-term knowledge stored in a vector database.
"A researcher's sticky-note log attached to a library search terminal, tracking previous searches to ensure the next book retrieved is relevant to the ongoing investigation."
Conceptual Overview
RAG with Memory is an architectural pattern that integrates conversational history into the retrieval-augmented generation loop, typically by using a 'Condense Question' step to rewrite follow-up queries into standalone search terms. While it enables coherent multi-turn reasoning, it introduces trade-offs such as increased latency from recursive LLM calls and higher token costs due to expanding context windows.
Disambiguation
Distinguishes session-based conversational state from the static long-term knowledge stored in a vector database.
Visual Analog
A researcher's sticky-note log attached to a library search terminal, tracking previous searches to ensure the next book retrieved is relevant to the ongoing investigation.