zhipu/glm-4.5-air is the lightweight member of Zhipu's GLM-4.5 family β strong reasoning and tool use at $0.15 input / $1.20 output per million tokens. TheRouter routes it primarily via siliconflow-intl with transparent failover to BigModel direct (zhipu-cn).
curl https://api.therouter.ai/v1/chat/completions \
-H "Authorization: Bearer $THE_ROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zhipu/glm-4.5-air",
"messages": [{"role": "user", "content": "Summarise the GLM-5 launch in 3 bullets"}],
"max_tokens": 256
}'from openai import OpenAI
client = OpenAI(
base_url="https://api.therouter.ai/v1",
api_key="$THE_ROUTER_API_KEY",
)
resp = client.chat.completions.create(
model="zhipu/glm-4.5-air",
messages=[
{"role": "system", "content": "You are a concise technical writer."},
{"role": "user", "content": "Summarise the GLM-5 launch in 3 bullets"},
],
max_tokens=256,
)
print(resp.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.therouter.ai/v1",
apiKey: process.env.THE_ROUTER_API_KEY,
});
const resp = await client.chat.completions.create({
model: "zhipu/glm-4.5-air",
messages: [{ role: "user", content: "Hello in 3 words" }],
max_tokens: 32,
});
console.log(resp.choices[0].message.content);reasoning parameter and chain-of-thought style outputs.tools + tool_choice work as-is.response_format: { type: 'json_object' } to force structured output.Customers hit POST /v1/chat/completions with model: "zhipu/glm-4.5-air". TheRouter resolves this to:
siliconflow-intl, upstream id zai-org/GLM-4.5-Air. Lowest latency, highest steady-state availability.zhipu-cn (BigModel direct), upstream id glm-4.5-air. Engages automatically when siliconflow returns HTTP 5xx, a timeout, or a rate-limit denial.Failover is silent β no client action required, no observable behavior change beyond a small latency tick during a switch event. You don't need to enable a flag.
Approximate per-million-token cost against the GLM family flagship at TheRouter published rates:
| Model | Input ($/MTok) | Output ($/MTok) | Use case |
|---|---|---|---|
zhipu/glm-4.5-air | 0.15 | 1.20 | High-volume chat, summarisation, agent loops |
zhipu/glm-4.7 | 0.50 | 2.33 | Hard reasoning, complex code |
For a 1K-input / 256-output chat turn, glm-4.5-air bills β $0.00046 vs β $0.00110 for glm-4.7 β roughly 2.4Γ cheaper. Pick the air variant for customer-support replies, content rewriting, summarisation; pick glm-4.7 when reasoning or code-generation quality is the bottleneck.
glm-4.7, claude-sonnet, or gpt-5.