Model Fallbacks

Fallbacks are how you keep an application usable when one model, route, or provider path is unavailable.

Use fallback policy for reliability, not for hiding uncertainty. A fallback should be explicit enough that engineering, product, and support teams can tell when the application used a secondary path and what changed for the user.

Good fallback patterns

primary model plus one lower-cost alternative
primary premium model plus one stable baseline
one route family per workload, with a documented escape hatch

Recommended design

Decision	Recommendation
Scope	Define fallbacks per workload, not globally for the whole application
Compatibility	Prefer a fallback in the same API family and request shape
Trigger	Retry on temporary failures, but do not retry invalid requests or auth failures as fallbacks
Visibility	Log every fallback event with workload, primary model, fallback model, status, and reason
User experience	Decide whether users should see a degraded-mode message, especially for paid tiers

Example policy

Workload	Primary	Fallback	Allowed trigger
Support chat	premium chat model	stable baseline chat model	transient `5xx`, timeout, or upstream unavailable
Batch extraction	low-cost extraction model	same-family baseline model	temporary provider failure
Code assistant	stronger code-capable model	none without user notice	quality-sensitive paid workflow
Image understanding	vision-capable model	same-family vision-capable model	transient provider failure only

What to avoid

silent fallback that changes output quality without observability
fallback chains that cross incompatible request shapes
using fallback as a substitute for real entitlement checks

A practical fallback policy

Choose one primary model per workload.
Pick one fallback in the same compatibility family.
Verify the fallback request shape with the same payload class.
Log every fallback event.
Expose degraded behavior clearly in internal dashboards.
Review fallback output quality before enabling it for user-visible workflows.

Error handling boundary

Error class	Fallback?	Reason
`400` invalid request	No	The payload must be fixed before retrying
`401` missing or invalid key	No	This is an authentication problem
`403` unavailable model or policy limit	Usually no	Check entitlement before changing behavior
`429` rate or quota limit	Maybe	Backoff first; fallback only if policy allows it
`5xx` or timeout	Often yes	Temporary failures are the most common fallback trigger

When not to fallback

Do not fallback silently when:

the user paid for a specific service tier
the request contains provider-specific guides that another route cannot honor
billing or compliance boundaries would change

On this page