Model Fallbacks
Fallbacks are how you keep an application usable when one model, route, or provider path is unavailable.
Fallbacks are how you keep an application usable when one model, route, or provider path is unavailable.
Use fallback policy for reliability, not for hiding uncertainty. A fallback should be explicit enough that engineering, product, and support teams can tell when the application used a secondary path and what changed for the user.
Good fallback patterns
- primary model plus one lower-cost alternative
- primary premium model plus one stable baseline
- one route family per workload, with a documented escape hatch
Recommended design
| Decision | Recommendation |
|---|---|
| Scope | Define fallbacks per workload, not globally for the whole application |
| Compatibility | Prefer a fallback in the same API family and request shape |
| Trigger | Retry on temporary failures, but do not retry invalid requests or auth failures as fallbacks |
| Visibility | Log every fallback event with workload, primary model, fallback model, status, and reason |
| User experience | Decide whether users should see a degraded-mode message, especially for paid tiers |
Example policy
| Workload | Primary | Fallback | Allowed trigger |
|---|---|---|---|
| Support chat | premium chat model | stable baseline chat model | transient 5xx, timeout, or upstream unavailable |
| Batch extraction | low-cost extraction model | same-family baseline model | temporary provider failure |
| Code assistant | stronger code-capable model | none without user notice | quality-sensitive paid workflow |
| Image understanding | vision-capable model | same-family vision-capable model | transient provider failure only |
What to avoid
- silent fallback that changes output quality without observability
- fallback chains that cross incompatible request shapes
- using fallback as a substitute for real entitlement checks
A practical fallback policy
- Choose one primary model per workload.
- Pick one fallback in the same compatibility family.
- Verify the fallback request shape with the same payload class.
- Log every fallback event.
- Expose degraded behavior clearly in internal dashboards.
- Review fallback output quality before enabling it for user-visible workflows.
Error handling boundary
| Error class | Fallback? | Reason |
|---|---|---|
400 invalid request | No | The payload must be fixed before retrying |
401 missing or invalid key | No | This is an authentication problem |
403 unavailable model or policy limit | Usually no | Check entitlement before changing behavior |
429 rate or quota limit | Maybe | Backoff first; fallback only if policy allows it |
5xx or timeout | Often yes | Temporary failures are the most common fallback trigger |
When not to fallback
Do not fallback silently when:
- the user paid for a specific service tier
- the request contains provider-specific guides that another route cannot honor
- billing or compliance boundaries would change
Related pages
Prompt Caching
Prompt caching reduces repeated-context cost and latency when the same prompt segments are reused. In practice, the main work is prompt design: stable prefixes are easier to reuse than constantly changing prompts.
Verify Your Integration
Run customer-facing checks before depending on AnyInt in production: model discovery, authentication, first requests, streaming, async tasks, callbacks, and error handling.