Streaming
Streaming is the simplest way to improve perceived latency in chat, copilots, and interactive generation flows.
Streaming is the simplest way to improve perceived latency in chat, copilots, and interactive generation flows.
Use streaming when the user benefits from seeing partial output. Do not use it as a replacement for async task APIs: video, music, and other asset workflows may still require task creation, polling, or callbacks.
Published streaming surfaces
| Surface | How to enable it |
|---|---|
| OpenAI-compatible chat | set stream: true on /openai/v1/chat/completions |
| Gemini streaming | call streamGenerateContent?alt=sse |
When to use streaming
- chat interfaces
- progressive generation in IDEs or copilots
- long-running answer generation where first-token latency matters
When not to use streaming
| Situation | Prefer |
|---|---|
| The route creates video, music, or another async asset | Task creation plus polling or callbacks |
| The client must validate one final JSON object before showing anything | Non-streaming response or buffered server-side validation |
| The user does not see output interactively | Batch request and stored final response |
| Your network layer cannot handle server-sent events | Start with non-streaming and add streaming after client support is verified |
What the stream looks like
- OpenAI-compatible chat returns SSE chunks in
chat.completion.chunkformat - Gemini streaming uses the published
streamGenerateContent?alt=sseroute - neither stream should be treated as a final answer until the stream closes
OpenAI-compatible example
curl https://api.anyint.ai/openai/v1/chat/completions \
-H "Authorization: Bearer $ANYINT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"stream": true,
"messages": [
{"role": "user", "content": "Stream three short bullets about API gateways."}
]
}'Gemini-compatible example
curl "https://api.anyint.ai/gemini/v1beta/models/gemini-3-flash-preview:streamGenerateContent?alt=sse" \
-H "Authorization: Bearer $ANYINT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [
{
"role": "user",
"parts": [{"text": "Stream a short answer about model routing."}]
}
]
}'Implementation tips
- treat each chunk as partial state, not a final answer
- keep your server tolerant to dropped connections
- store the final assembled output if you need auditing or replay
- separate UI rendering from business-side completion logic
- handle stream interruption separately from model refusal or invalid request errors
- test through the same proxy, CDN, or server framework used in production
Production checklist
| Check | Why it matters |
|---|---|
| Client receives more than one chunk | Confirms the client is not buffering until completion |
| UI can cancel a stream | Avoids wasting tokens after a user navigates away |
| Server handles dropped connections | Mobile and browser clients disconnect often |
| Final assembled output is stored when needed | Auditing and replay need the complete text, not only UI chunks |
| Error path is visible | Users need a clear failure state if the stream stops early |
Related pages
Overview
AnyInt is not only a transport layer. The platform also changes how applications behave once they are in production: how output is streamed, how keys are limited, how routing is applied, and how model output is shaped for downstream systems.
Structured Outputs
Structured output is a product pattern, not one dedicated AnyInt endpoint. The exact mechanism depends on the compatibility family and model you choose.