Prompt Caching

Prompt caching reduces repeated-context cost and latency when the same prompt segments are reused. In practice, the main work is prompt design: stable prefixes are easier to reuse than constantly changing prompts.

Good use cases

stable system prompts
repeated long documents
reusable instruction blocks
shared prompt templates across a team

Practical guidance

keep reusable context stable
separate static instructions from user-specific content
place large shared context before volatile user input
monitor actual usage data instead of assuming a fixed discount across all models

A good prompt layout

static policy and instructions
reusable long-form reference material
tool definitions or schema instructions
request-specific user content

What breaks reuse

injecting timestamps or random values into the static prefix
rebuilding prompts in a slightly different order on every request
mixing cached and non-cached content into one unstable blob

Caching behavior and savings can vary by model family, account setup, and plan. Check your actual usage metrics when you optimize for cost.

Good use cases

Practical guidance

A good prompt layout

What breaks reuse

On this page