Streaming

Streaming is the simplest way to improve perceived latency in chat, copilots, and interactive generation flows.

Use streaming when the user benefits from seeing partial output. Do not use it as a replacement for async task APIs: video, music, and other asset workflows may still require task creation, polling, or callbacks.

Published streaming surfaces

Surface	How to enable it
OpenAI-compatible chat	set `stream: true` on `/openai/v1/chat/completions`
Gemini streaming	call `streamGenerateContent?alt=sse`

When to use streaming

chat interfaces
progressive generation in IDEs or copilots
long-running answer generation where first-token latency matters

When not to use streaming

Situation	Prefer
The route creates video, music, or another async asset	Task creation plus polling or callbacks
The client must validate one final JSON object before showing anything	Non-streaming response or buffered server-side validation
The user does not see output interactively	Batch request and stored final response
Your network layer cannot handle server-sent events	Start with non-streaming and add streaming after client support is verified

What the stream looks like

OpenAI-compatible chat returns SSE chunks in chat.completion.chunk format
Gemini streaming uses the published streamGenerateContent?alt=sse route
neither stream should be treated as a final answer until the stream closes

OpenAI-compatible example

curl https://api.anyint.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer $ANYINT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Stream three short bullets about API gateways."}
    ]
  }'

Gemini-compatible example

curl "https://api.anyint.ai/gemini/v1beta/models/gemini-3-flash-preview:streamGenerateContent?alt=sse" \
  -H "Authorization: Bearer $ANYINT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{"text": "Stream a short answer about model routing."}]
      }
    ]
  }'

Implementation tips

treat each chunk as partial state, not a final answer
keep your server tolerant to dropped connections
store the final assembled output if you need auditing or replay
separate UI rendering from business-side completion logic
handle stream interruption separately from model refusal or invalid request errors
test through the same proxy, CDN, or server framework used in production

Production checklist

Check	Why it matters
Client receives more than one chunk	Confirms the client is not buffering until completion
UI can cancel a stream	Avoids wasting tokens after a user navigates away
Server handles dropped connections	Mobile and browser clients disconnect often
Final assembled output is stored when needed	Auditing and replay need the complete text, not only UI chunks
Error path is visible	Users need a clear failure state if the stream stops early

On this page