Guides
Prompt Caching
Prompt caching reduces repeated-context cost and latency when the same prompt segments are reused. In practice, the main work is prompt design: stable prefixes are easier to reuse than constantly changing prompts.
Prompt caching reduces repeated-context cost and latency when the same prompt segments are reused. In practice, the main work is prompt design: stable prefixes are easier to reuse than constantly changing prompts.
Good use cases
- stable system prompts
- repeated long documents
- reusable instruction blocks
- shared prompt templates across a team
Practical guidance
- keep reusable context stable
- separate static instructions from user-specific content
- place large shared context before volatile user input
- monitor actual usage data instead of assuming a fixed discount across all models
A good prompt layout
- static policy and instructions
- reusable long-form reference material
- tool definitions or schema instructions
- request-specific user content
What breaks reuse
- injecting timestamps or random values into the static prefix
- rebuilding prompts in a slightly different order on every request
- mixing cached and non-cached content into one unstable blob
Caching behavior and savings can vary by model family, account setup, and plan. Check your actual usage metrics when you optimize for cost.