Overview

Choose the right AnyInt route family for image understanding, image generation, video generation, and music workflows.

AnyInt currently exposes multimodal capabilities through several route families instead of one single abstraction layer. The main decision is whether you need image understanding, image generation, video generation, or full music creation workflows.

Current multimodal surfaces

Surface	Best for	Sync or task-based
Anthropic-compatible messages	text plus image understanding	sync
Gemini-compatible routes	text generation, image generation, Gemini-native function flows	sync or streaming
DashScope media routes	prompt-driven image and video generation	sync for images, task-based for video
AI Music	song generation, covers, lyrics, stems, and music video workflows	task-based

How to choose

Need image understanding inside a message flow: start with Anthropic-compatible messages
Need generated images: use Gemini image generation or DashScope image generation
Need prompt-driven video: use DashScope video generation
Need music creation or transformation: use AI Music

Sync vs async matters

Not every multimodal route is synchronous:

image generation can return usable content in one response
video and music workflows usually involve tasks, polling, and webhooks

That affects UI design, retry logic, and how you store outputs.

Current multimodal surfaces

How to choose

Sync vs async matters

On this page