Guides

Streaming

Streaming is the simplest way to improve perceived latency in chat, copilots, and interactive generation flows.

Streaming is the simplest way to improve perceived latency in chat, copilots, and interactive generation flows.

Published streaming surfaces

Surface	How to enable it
OpenAI-compatible chat	set `stream: true` on `/openai/v1/chat/completions`
Gemini streaming	call `streamGenerateContent?alt=sse`

When to use streaming

chat interfaces
progressive generation in IDEs or copilots
long-running answer generation where first-token latency matters

What the stream looks like

OpenAI-compatible chat returns SSE chunks in chat.completion.chunk format
Gemini streaming uses the published streamGenerateContent?alt=sse route
neither stream should be treated as a final answer until the stream closes

Implementation tips

treat each chunk as partial state, not a final answer
keep your server tolerant to dropped connections
store the final assembled output if you need auditing or replay
separate UI rendering from business-side completion logic

Overview

AnyInt is not only a transport layer. The platform also changes how applications behave once they are in production: how output is streamed, how keys are limited, how routing is applied, and how model output is shaped for downstream systems.

Structured Outputs

Structured output is a product pattern, not one dedicated AnyInt endpoint. The exact mechanism depends on the compatibility family and model you choose.

On this page

Published streaming surfaces When to use streaming What the stream looks like Implementation tips Related pages