AnyInt Docs
Guides

Prompt Caching

Prompt caching reduces repeated-context cost and latency when the same prompt segments are reused. In practice, the main work is prompt design: stable prefixes are easier to reuse than constantly changing prompts.

Prompt caching reduces repeated-context cost and latency when the same prompt segments are reused. In practice, the main work is prompt design: stable prefixes are easier to reuse than constantly changing prompts.

Good use cases

  • stable system prompts
  • repeated long documents
  • reusable instruction blocks
  • shared prompt templates across a team

Practical guidance

  • keep reusable context stable
  • separate static instructions from user-specific content
  • place large shared context before volatile user input
  • monitor actual usage data instead of assuming a fixed discount across all models

A good prompt layout

  1. static policy and instructions
  2. reusable long-form reference material
  3. tool definitions or schema instructions
  4. request-specific user content

What breaks reuse

  • injecting timestamps or random values into the static prefix
  • rebuilding prompts in a slightly different order on every request
  • mixing cached and non-cached content into one unstable blob

Caching behavior and savings can vary by model family, account setup, and plan. Check your actual usage metrics when you optimize for cost.

On this page