Image Input

The current published catalog shows image-aware behavior most clearly in provider-compatible request bodies rather than in one generic vision endpoint.

Image input today

Route family	What it supports
Anthropic-compatible messages	image input plus text instructions in the same message
Gemini image generation	text-to-image and mixed text-plus-image output
DashScope image generation	prompt-driven asset generation with image-specific parameters

Anthropic-style image input

The clearest published image-input route in the current catalog is:

POST /anthropic/v1/messages

In that request body, messages[].content can be an array of blocks such as:

image
text

That makes it a good fit for captioning, scene understanding, or document-like visual reasoning where your application already uses Claude-style content blocks.

Practical guidance

Use provider-native request shapes for image input instead of trying to force everything into one generic schema
If your application already uses Claude-style content blocks, Anthropic-compatible routes are the cleanest fit
If your application is generating or editing visuals, use the Gemini or DashScope media pages in this docs set

Image input today

Anthropic-style image input

Practical guidance

Related pages

On this page