Guides
Streaming
Streaming is the simplest way to improve perceived latency in chat, copilots, and interactive generation flows.
Streaming is the simplest way to improve perceived latency in chat, copilots, and interactive generation flows.
Published streaming surfaces
| Surface | How to enable it |
|---|---|
| OpenAI-compatible chat | set stream: true on /openai/v1/chat/completions |
| Gemini streaming | call streamGenerateContent?alt=sse |
When to use streaming
- chat interfaces
- progressive generation in IDEs or copilots
- long-running answer generation where first-token latency matters
What the stream looks like
- OpenAI-compatible chat returns SSE chunks in
chat.completion.chunkformat - Gemini streaming uses the published
streamGenerateContent?alt=sseroute - neither stream should be treated as a final answer until the stream closes
Implementation tips
- treat each chunk as partial state, not a final answer
- keep your server tolerant to dropped connections
- store the final assembled output if you need auditing or replay
- separate UI rendering from business-side completion logic
Related pages
Overview
AnyInt is not only a transport layer. The platform also changes how applications behave once they are in production: how output is streamed, how keys are limited, how routing is applied, and how model output is shaped for downstream systems.
Structured Outputs
Structured output is a product pattern, not one dedicated AnyInt endpoint. The exact mechanism depends on the compatibility family and model you choose.