Edit Images

Edit an existing image with a text prompt. The endpoint accepts an input image (and an optional mask) plus instructions, and returns the edited image as base64-encoded PNG.

POST/v1/images/edits

Content type

multipart/form-data. The body cannot exceed 100 MB. Use the same multipart shape as OpenAI's reference Python/JS SDKs.

Form fields

Name	Type	Required	Description
model	string	Required	Standard model alias in brand/model format. Must declare image_edit capability. See Supported Models below.
image	file	Required	Source image to edit. PNG, JPEG, or WebP. Recommend ≤ 4 MB and ≥ 256×256.
prompt	string	Required	Text instruction describing the edit. The clearer and more localized the instruction, the better the result.
mask	file		Optional mask image (same dimensions as image). Opaque pixels mark regions to edit; transparent pixels are preserved. PNG with alpha channel recommended.
size	string		Output size. e.g. 1024x1024, 1024x1536, 1536x1024, or auto. Defaults to the upstream model default when omitted.
quality	string		low \| medium \| high \| auto. Output fidelity tier. Lower tiers are dramatically cheaper — see the model page for per-tier pricing.
n	integer		Number of edited images to generate. Currently only n=1 is supported.
background	string		transparent \| opaque \| auto. When the upstream model supports it, controls background handling.
output_format	string		png \| jpeg \| webp. Defaults to png. Mapped to the upstream model where supported.
user	string		End-user identifier for analytics and abuse controls.

Supported models

A model is eligible for edits when its architecture.input_modalities includes image AND its capabilities include image_edit. Query /v1/models for the live list.

Model	Notes
openai/gpt-image-2	OpenAI's flagship image model with native reasoning. Routed via OpenRouter. Edits take 60–180s; quality=low keeps cost near $0.006/image.
openai/gpt-image-1.5	Routed via OpenAI direct. Faster than gpt-image-2 with comparable edit quality. quality=low ≈ $0.009/image.
openai/gpt-image-1	Routed via OpenAI direct. Stable and cheap; quality=low ≈ $0.011/image.
black-forest-labs/flux-kontext-pro	Black Forest Labs FLUX.1 Kontext Pro. Routed via SiliconFlow. Flat $0.04 per edit. Strong composition preservation.
black-forest-labs/flux-kontext-max	Highest-fidelity FLUX.1 Kontext variant. Routed via SiliconFlow. Flat $0.08 per edit. Slower than Pro.
black-forest-labs/flux-kontext-dev	Open-weight FLUX.1 Kontext (12B). Routed via SiliconFlow. Flat $0.015 per edit — cheapest entry point.
qwen/qwen-image-edit	Alibaba Qwen's image-edit model on the 20B Qwen-Image base. Routed via SiliconFlow. Flat $0.04 per edit. Excels at editing text inside images.

Live availability

The table above is hand-maintained. The authoritative list is the live /v1/models response — any model whose architecture.features array includes image_edit is routable through this endpoint.

Response shape

json

{
  "created": 1778120000,
  "data": [
    {
      "b64_json": "iVBORw0KGgoAAAANSUhEUgAA..."
    }
  ],
  "usage": {
    "prompt_tokens": 1809,
    "completion_tokens": 7106,
    "total_tokens": 8915
  }
}

data[0].b64_json is base64-encoded PNG bytes. Decode with the language-native base64 facility, then write to disk.

Examples

curl

curl -X POST https://api.therouter.ai/v1/images/edits \
  -H "Authorization: Bearer $THEROUTER_API_KEY" \
  -F "model=openai/gpt-image-2" \
  -F "prompt=Turn this scene into a watercolor painting" \
  -F "size=1024x1024" \
  -F "quality=low" \
  -F "image=@input.png" \
  -o response.json

# Decode the edited image
jq -r '.data[0].b64_json' response.json | base64 -d > edited.png

Behavior notes

The gateway forwards the request through its routing layer. The response model echoes the standard alias (e.g. openai/gpt-image-2), never the upstream model string.
Edits are slow. Plan client timeouts for at least 5 minutes. The gateway caps upstream wait at 5 minutes per attempt and surfaces a structured upstream_error if the upstream blows the budget.
When you supply a mask, the gateway converts it to the upstream model's native mask format if available, or attaches it as a second image with a textual mask instruction otherwise.
n > 1 is rejected with a 400. To produce variants, send parallel requests with seed or prompt variations.
Pricing is metered per output image AND per token (prompt + reasoning). Use quality=low for fast iteration.

Errors

json

{
  "error": {
    "message": "model openai/foo does not support image editing",
    "type": "invalid_request_error",
    "code": "model_capability_mismatch"
  }
}

Common error codes: invalid_api_key, invalid_request_error,model_capability_mismatch, upstream_error,rate_limit_exceeded, insufficient_credits.

Response headers

Name	Type	Description
x-request-id	response header	Request trace ID (UUIDv4). Include in support tickets.
X-RateLimit-Limit	response header	Request quota window limit.
X-RateLimit-Remaining	response header	Remaining requests in the current window.
Retry-After	response header	Present on 429 responses. Value is seconds.

Generate vs edit

Use /v1/images/generations when you want a fresh image from text alone. Use this endpoint when you have an input image you want to modify.

Speech-to-Text

Models