Overview
Choose the right AnyInt route family for image understanding, image generation, video generation, and music workflows.
AnyInt currently exposes multimodal capabilities through several route families instead of one single abstraction layer. The main decision is whether you need image understanding, image generation, video generation, or full music creation workflows.
Current multimodal surfaces
| Surface | Best for | Sync or task-based |
|---|---|---|
| Anthropic-compatible messages | text plus image understanding | sync |
| Gemini-compatible routes | text generation, image generation, Gemini-native function flows | sync or streaming |
| DashScope media routes | prompt-driven image and video generation | sync for images, task-based for video |
| AI Music | song generation, covers, lyrics, stems, and music video workflows | task-based |
How to choose
- Need image understanding inside a message flow: start with Anthropic-compatible messages
- Need generated images: use Gemini image generation or DashScope image generation
- Need prompt-driven video: use DashScope video generation
- Need music creation or transformation: use AI Music
Sync vs async matters
Not every multimodal route is synchronous:
- image generation can return usable content in one response
- video and music workflows usually involve tasks, polling, and webhooks
That affects UI design, retry logic, and how you store outputs.
Models
Model selection in AnyInt is a product decision, not only a code decision. You are choosing a request shape, an output modality, and an operational path at the same time.
Image Input
The current published catalog shows image-aware behavior most clearly in provider-compatible request bodies rather than in one generic vision endpoint.