OpenAI Compatible API
Use this route when you want the fastest path to production or when your application already speaks the OpenAI Chat Completions format.
Use this route when you want the fastest path to production or when your application already speaks the OpenAI Chat Completions format.
Base URL
https://gateway.api.anyint.ai/openai/v1
Published route
POST /chat/completions
Full URL:
https://gateway.api.anyint.ai/openai/v1/chat/completionsAuthentication
Authorization: Bearer <ANYINT_API_KEY>
Content-Type: application/jsonWhen to use this route
- You already use the OpenAI Python or JavaScript SDK
- You want one stable chat entrypoint for multiple model families
- You want SSE streaming with minimal integration work
If you need Gemini-native request bodies or Anthropic-specific guides such as token counting, use those provider-compatible pages directly.
If a client specifically requires OpenAI Responses API wiring, use OpenAI Responses API. For embeddings, image generation, audio, files, batches, assistants, threads, and videos, see OpenAI Extended Endpoints.
Core request shape
| Field | Type | Required | Meaning | Example |
|---|---|---|---|---|
model | string | Yes | Model ID to use for the request. Fetch this from Models API instead of guessing. | claude-sonnet-4-6 |
messages | array | Yes | Ordered conversation turns in OpenAI Chat Completions format. | [{"role":"user","content":"Hello"}] |
stream | boolean | No | Set true to receive SSE chunks; omit or set false for a single JSON response. | true |
max_tokens | integer | No | Maximum number of output tokens the model should generate. | 512 |
temperature | number | No | Controls randomness. Lower values are more deterministic; higher values are more varied. | 0.7 |
top_p | number | No | Nucleus sampling control. Use either temperature or top_p tuning first; avoid changing both without a reason. | 1 |
stop | string or array | No | Stop sequence or sequences that end generation early. | ["\nUser:"] |
tools | array | No | OpenAI-compatible tool definitions for model tool calling when supported by the selected model. | [{"type":"function",...}] |
tool_choice | string or object | No | Controls whether the model may call tools, must call a specific tool, or should not call tools. | "auto" |
presence_penalty | number | No | Penalizes new tokens based on whether they already appeared in the text. | 0 |
frequency_penalty | number | No | Penalizes repeated tokens based on frequency. | 0 |
user | string | No | End-user identifier for your own abuse monitoring or request tracing. Do not send private personal data unless your policy allows it. | user_123 |
The current published schema documents the minimum request body needed to get a completion working. If your client already uses other OpenAI-compatible fields, validate them against the target model before relying on them in production.
Message fields
| Field | Type | Required | Meaning | Example |
|---|---|---|---|---|
role | string | Yes | Message author. Common values are system, user, assistant, and tool. | user |
content | string or array | Usually | Text content or multimodal content blocks when supported by the selected model. | Explain AnyInt in one sentence. |
name | string | No | Optional participant name used by some OpenAI-compatible clients. | planner |
tool_call_id | string | Tool responses only | Associates a tool response with a prior tool call. | call_123 |
Response fields
| Field | Meaning |
|---|---|
id | Response identifier returned by the compatible API path |
object | Response object type, such as chat.completion or chat.completion.chunk |
created | Unix timestamp for response creation |
model | Model that handled the request |
choices[] | Generated message or stream delta choices |
choices[].message.content | Generated text for non-streaming responses |
choices[].delta.content | Incremental text for streaming responses |
usage.prompt_tokens | Input tokens counted by the provider when available |
usage.completion_tokens | Output tokens counted by the provider when available |
usage.total_tokens | Total tokens counted by the provider when available |
cURL example
curl https://gateway.api.anyint.ai/openai/v1/chat/completions \
-H "Authorization: Bearer $ANYINT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"stream": true,
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Explain AnyInt in one sentence."}
]
}'Python example
from openai import OpenAI
client = OpenAI(
base_url="https://gateway.api.anyint.ai/openai/v1",
api_key="your-anyint-api-key",
)
response = client.chat.completions.create(
model="claude-sonnet-4-6",
stream=True,
messages=[
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Explain AnyInt in one sentence."},
],
)
for chunk in response:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="")JavaScript example
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://gateway.api.anyint.ai/openai/v1",
apiKey: process.env.ANYINT_API_KEY,
});
const stream = await client.chat.completions.create({
model: "claude-sonnet-4-6",
stream: true,
messages: [
{ role: "system", content: "You are a concise assistant." },
{ role: "user", content: "Explain AnyInt in one sentence." },
],
});
for await (const chunk of stream) {
const delta = chunk.choices?.[0]?.delta?.content;
if (delta) process.stdout.write(delta);
}Response behavior
- With
stream: true, the route returns SSE chunks - The published schema uses
chat.completion.chunkstyle objects - Each chunk can contain
choices[0].delta.content - The stream ends with
[DONE] - Final usage data can arrive near the end of the stream
Common mistakes
- Using
x-api-keyinstead ofAuthorization: Bearer - Hardcoding a model ID before checking Models API
- Treating a partial stream chunk as the final answer
- Mixing provider-native request fields into the OpenAI payload without validation