SmartFAQs.ai
Back to Learn
Intermediate

Prompt Caching

Prompt caching is a technique used to persist and reuse the computed state of frequently used prompt prefixes—such as system instructions, few-shot examples, or static RAG context—to significantly reduce latency and token processing costs during inference.

Definition

Prompt caching is a technique used to persist and reuse the computed state of frequently used prompt prefixes—such as system instructions, few-shot examples, or static RAG context—to significantly reduce latency and token processing costs during inference.

Disambiguation

Distinct from Semantic Caching (which stores model outputs) as it caches the pre-processed state of the prompt inputs themselves.

Visual Metaphor

"A reusable rubber stamp for a long legal header, allowing you to bypass handwriting the same intro on every page."

Key Tools
Anthropic API (Prompt Caching)OpenAI Prompt CachingvLLMSGLangDeepSpeed-MII
Related Connections

Conceptual Overview

Prompt caching is a technique used to persist and reuse the computed state of frequently used prompt prefixes—such as system instructions, few-shot examples, or static RAG context—to significantly reduce latency and token processing costs during inference.

Disambiguation

Distinct from Semantic Caching (which stores model outputs) as it caches the pre-processed state of the prompt inputs themselves.

Visual Analog

A reusable rubber stamp for a long legal header, allowing you to bypass handwriting the same intro on every page.

Related Articles