AnyInt Docs
Guides

Streaming

Streaming is the simplest way to improve perceived latency in chat, copilots, and interactive generation flows.

Streaming is the simplest way to improve perceived latency in chat, copilots, and interactive generation flows.

Use streaming when the user benefits from seeing partial output. Do not use it as a replacement for async task APIs: video, music, and other asset workflows may still require task creation, polling, or callbacks.

Published streaming surfaces

SurfaceHow to enable it
OpenAI-compatible chatset stream: true on /openai/v1/chat/completions
Gemini streamingcall streamGenerateContent?alt=sse

When to use streaming

  • chat interfaces
  • progressive generation in IDEs or copilots
  • long-running answer generation where first-token latency matters

When not to use streaming

SituationPrefer
The route creates video, music, or another async assetTask creation plus polling or callbacks
The client must validate one final JSON object before showing anythingNon-streaming response or buffered server-side validation
The user does not see output interactivelyBatch request and stored final response
Your network layer cannot handle server-sent eventsStart with non-streaming and add streaming after client support is verified

What the stream looks like

  • OpenAI-compatible chat returns SSE chunks in chat.completion.chunk format
  • Gemini streaming uses the published streamGenerateContent?alt=sse route
  • neither stream should be treated as a final answer until the stream closes

OpenAI-compatible example

curl https://api.anyint.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer $ANYINT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Stream three short bullets about API gateways."}
    ]
  }'

Gemini-compatible example

curl "https://api.anyint.ai/gemini/v1beta/models/gemini-3-flash-preview:streamGenerateContent?alt=sse" \
  -H "Authorization: Bearer $ANYINT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{"text": "Stream a short answer about model routing."}]
      }
    ]
  }'

Implementation tips

  • treat each chunk as partial state, not a final answer
  • keep your server tolerant to dropped connections
  • store the final assembled output if you need auditing or replay
  • separate UI rendering from business-side completion logic
  • handle stream interruption separately from model refusal or invalid request errors
  • test through the same proxy, CDN, or server framework used in production

Production checklist

CheckWhy it matters
Client receives more than one chunkConfirms the client is not buffering until completion
UI can cancel a streamAvoids wasting tokens after a user navigates away
Server handles dropped connectionsMobile and browser clients disconnect often
Final assembled output is stored when neededAuditing and replay need the complete text, not only UI chunks
Error path is visibleUsers need a clear failure state if the stream stops early

On this page