SmartFAQs.ai
Back to Learn
Intermediate

Caching

The storage and reuse of pre-computed LLM responses, vector embeddings, or prompt prefixes to reduce inference latency and API token costs. In RAG specifically, it involves Semantic Caching—matching new queries to cached results based on vector similarity rather than exact string matching—and Prompt Caching to preserve context across long agentic turns.

Definition

The storage and reuse of pre-computed LLM responses, vector embeddings, or prompt prefixes to reduce inference latency and API token costs. In RAG specifically, it involves Semantic Caching—matching new queries to cached results based on vector similarity rather than exact string matching—and Prompt Caching to preserve context across long agentic turns.

Disambiguation

Refers to Semantic and Prompt caching, not hardware L1/L2 or browser-side storage.

Visual Metaphor

"A prepared 'frequently asked questions' sheet at a help desk that prevents the clerk from calling the back office for every repeat visitor."

Key Tools
GPTCacheRedisLangChainAnthropic Prompt CachingMomentoMilvus
Related Connections

Conceptual Overview

The storage and reuse of pre-computed LLM responses, vector embeddings, or prompt prefixes to reduce inference latency and API token costs. In RAG specifically, it involves Semantic Caching—matching new queries to cached results based on vector similarity rather than exact string matching—and Prompt Caching to preserve context across long agentic turns.

Disambiguation

Refers to Semantic and Prompt caching, not hardware L1/L2 or browser-side storage.

Visual Analog

A prepared 'frequently asked questions' sheet at a help desk that prevents the clerk from calling the back office for every repeat visitor.

Related Articles