Prompt Caching

Definition

Prompt caching is a technique used to persist and reuse the computed state of frequently used prompt prefixes—such as system instructions, few-shot examples, or static RAG context—to significantly reduce latency and token processing costs during inference.

Disambiguation

Distinct from Semantic Caching (which stores model outputs) as it caches the pre-processed state of the prompt inputs themselves.

Visual Metaphor

"A reusable rubber stamp for a long legal header, allowing you to bypass handwriting the same intro on every page."

Key Tools

Anthropic API (Prompt Caching)OpenAI Prompt CachingvLLMSGLangDeepSpeed-MII

Related Connections

KV Cache(Underlying Mechanism)
System Prompt(Prerequisite)
Context Window(Constraint)

Conceptual Overview

Disambiguation

Distinct from Semantic Caching (which stores model outputs) as it caches the pre-processed state of the prompt inputs themselves.

Visual Analog

A reusable rubber stamp for a long legal header, allowing you to bypass handwriting the same intro on every page.

Definition

Conceptual Overview

Disambiguation

Visual Analog

Related Articles