AnyInt Docs
Guides

Streaming

Streaming is the simplest way to improve perceived latency in chat, copilots, and interactive generation flows.

Streaming is the simplest way to improve perceived latency in chat, copilots, and interactive generation flows.

Published streaming surfaces

SurfaceHow to enable it
OpenAI-compatible chatset stream: true on /openai/v1/chat/completions
Gemini streamingcall streamGenerateContent?alt=sse

When to use streaming

  • chat interfaces
  • progressive generation in IDEs or copilots
  • long-running answer generation where first-token latency matters

What the stream looks like

  • OpenAI-compatible chat returns SSE chunks in chat.completion.chunk format
  • Gemini streaming uses the published streamGenerateContent?alt=sse route
  • neither stream should be treated as a final answer until the stream closes

Implementation tips

  • treat each chunk as partial state, not a final answer
  • keep your server tolerant to dropped connections
  • store the final assembled output if you need auditing or replay
  • separate UI rendering from business-side completion logic

On this page