Overview
Choose the right AnyInt route family for image understanding, image generation, video generation, and music workflows.
AnyInt currently exposes multimodal capabilities through several route families instead of one single abstraction layer. The main decision is whether you need image understanding, image generation, video generation, or full music creation workflows.
Current multimodal surfaces
| Surface | Best for | Request shape | Sync or task-based |
|---|---|---|---|
| Anthropic-compatible messages | Text plus image understanding | Claude-style messages[].content[] blocks | Sync |
| Gemini-compatible routes | Text generation, image generation, Gemini-native function flows | contents[].parts[] | Sync or streaming |
| DashScope media routes | Prompt-driven image and video generation | Media-specific input and parameters payloads | Sync for images, task-based for video |
| AI Music | Song generation, covers, lyrics, stems, and music video workflows | Music-specific task payloads | Task-based |
How to choose
| Need | Start with | Read next |
|---|---|---|
| Ask questions about one or more reference images | Image Input | Anthropic-compatible API or Gemini-compatible API |
| Generate an image from text or image context | Image Generation | Gemini-compatible API and Media API |
| Generate or poll video tasks | Video Generation | Media API |
| Create songs, covers, lyrics, stems, or music videos | Media & Music | AI Music API |
| Choose a text or reasoning model before adding media | Models | Models API |
Sync vs async matters
Not every multimodal route is synchronous:
- image generation can return usable content in one response
- video and music workflows usually involve tasks, polling, and webhooks
That affects UI design, retry logic, and how you store outputs.
| Response pattern | Design implication |
|---|---|
| Synchronous response | The client can use the response immediately after request completion |
| Streaming response | The UI must treat chunks as partial state until the stream closes |
| Task creation response | Store the task ID and show pending or processing state |
| Callback or polling result | Mark the asset ready only after a successful terminal status and output URL |
Input and output rules of thumb
| Rule | Why it matters |
|---|---|
| Keep route families separate | OpenAI, Anthropic, Gemini, media, and music routes do not share one universal body shape |
| Use public URLs or supported inline formats when sending media | The model provider or gateway must be able to access the asset |
| Do not assume a model ID implies every modality | The route contract determines whether image, video, or music behavior is supported |
| Treat generated assets as customer data | Store output URLs, prompts, and metadata according to your privacy and retention policy |
| Test with small payloads first | It is easier to isolate auth, model access, and request-shape issues before using large files |
Related pages
Models
Model selection in AnyInt is a product decision, not only a code decision. You are choosing a request shape, an output modality, and an operational path at the same time.
Image Input
How to send text and reference images in the same AnyInt request. The recommended default is the OpenAI-compatible image_url content block.