Image Input
The current published catalog shows image-aware behavior most clearly in provider-compatible request bodies rather than in one generic vision endpoint.
The current published catalog shows image-aware behavior most clearly in provider-compatible request bodies rather than in one generic vision endpoint.
Image input today
| Route family | What it supports |
|---|---|
| Anthropic-compatible messages | image input plus text instructions in the same message |
| Gemini image generation | text-to-image and mixed text-plus-image output |
| DashScope image generation | prompt-driven asset generation with image-specific parameters |
Anthropic-style image input
The clearest published image-input route in the current catalog is:
POST /anthropic/v1/messages
In that request body, messages[].content can be an array of blocks such as:
imagetext
That makes it a good fit for captioning, scene understanding, or document-like visual reasoning where your application already uses Claude-style content blocks.
Practical guidance
- Use provider-native request shapes for image input instead of trying to force everything into one generic schema
- If your application already uses Claude-style content blocks, Anthropic-compatible routes are the cleanest fit
- If your application is generating or editing visuals, use the Gemini or DashScope media pages in this docs set
Related pages
Overview
Choose the right AnyInt route family for image understanding, image generation, video generation, and music workflows.
Image Generation
AnyInt publishes Gemini-native, DashScope, and Transtreams Kling image-generation paths. They overlap in outcome, but the request bodies and strengths are different.