Definition
RAG with Memory is an architectural pattern that integrates conversational history into the retrieval-augmented generation loop, typically by using a 'Condense Question' step to rewrite follow-up queries into standalone search terms. While it enables coherent multi-turn reasoning, it introduces trade-offs such as increased latency from recursive LLM calls and higher token costs due to expanding context windows.
Distinguishes session-based conversational state from the static long-term knowledge stored in a vector database.
"A researcher's sticky-note log attached to a library search terminal, tracking previous searches to ensure the next book retrieved is relevant to the ongoing investigation."
- Condense Question(Prerequisite)
- Conversation Buffer(Component)
- Sliding Window Memory(Component)
- Context Overflow(Risk Factor)
Conceptual Overview
RAG with Memory is an architectural pattern that integrates conversational history into the retrieval-augmented generation loop, typically by using a 'Condense Question' step to rewrite follow-up queries into standalone search terms. While it enables coherent multi-turn reasoning, it introduces trade-offs such as increased latency from recursive LLM calls and higher token costs due to expanding context windows.
Disambiguation
Distinguishes session-based conversational state from the static long-term knowledge stored in a vector database.
Visual Analog
A researcher's sticky-note log attached to a library search terminal, tracking previous searches to ensure the next book retrieved is relevant to the ongoing investigation.