zhipu/glm-4.5-air Tutorial

zhipu/glm-4.5-air is the lightweight member of Zhipu's GLM-4.5 family — strong reasoning and tool use at $0.15 input / $1.20 output per million tokens. TheRouter routes it primarily via siliconflow-intl with transparent failover to BigModel direct (zhipu-cn).

Quickstart — cURL

curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THE_ROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zhipu/glm-4.5-air",
    "messages": [{"role": "user", "content": "Summarise the GLM-5 launch in 3 bullets"}],
    "max_tokens": 256
  }'

Python — OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.therouter.ai/v1",
    api_key="$THE_ROUTER_API_KEY",
)

resp = client.chat.completions.create(
    model="zhipu/glm-4.5-air",
    messages=[
        {"role": "system", "content": "You are a concise technical writer."},
        {"role": "user", "content": "Summarise the GLM-5 launch in 3 bullets"},
    ],
    max_tokens=256,
)
print(resp.choices[0].message.content)

JavaScript / TypeScript — OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.therouter.ai/v1",
  apiKey: process.env.THE_ROUTER_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "zhipu/glm-4.5-air",
  messages: [{ role: "user", content: "Hello in 3 words" }],
  max_tokens: 32,
});
console.log(resp.choices[0].message.content);

Capabilities

Reasoning — supports reasoning parameter and chain-of-thought style outputs.
Function calling — OpenAI-style tools + tool_choice work as-is.
JSON mode — pass response_format: { type: 'json_object' } to force structured output.
Prompt caching — siliconflow-intl supports cache reads at a discounted rate.
131K context — long-document workloads fit in a single call.

Routing layout

Customers hit POST /v1/chat/completions with model: "zhipu/glm-4.5-air". TheRouter resolves this to:

Priority 0 (primary) — siliconflow-intl, upstream id zai-org/GLM-4.5-Air. Lowest latency, highest steady-state availability.
Priority 1 (fallback) — zhipu-cn (BigModel direct), upstream id glm-4.5-air. Engages automatically when siliconflow returns HTTP 5xx, a timeout, or a rate-limit denial.

Failover is silent — no client action required, no observable behavior change beyond a small latency tick during a switch event. You don't need to enable a flag.

Cost comparison

Approximate per-million-token cost against the GLM family flagship at TheRouter published rates:

Model	Input ($/MTok)	Output ($/MTok)	Use case
`zhipu/glm-4.5-air`	`0.15`	`1.20`	High-volume chat, summarisation, agent loops
`zhipu/glm-4.7`	`0.50`	`2.33`	Hard reasoning, complex code

For a 1K-input / 256-output chat turn, glm-4.5-air bills ≈ $0.00046 vs ≈ $0.00110 for glm-4.7 — roughly 2.4× cheaper. Pick the air variant for customer-support replies, content rewriting, summarisation; pick glm-4.7 when reasoning or code-generation quality is the bottleneck.

When to use it

High-volume Chinese-leaning chat (customer support, rewriting, summarisation).
Agent orchestration where many cheap model calls beat a single expensive one.
Cost-sensitive batch workloads on long context.

When NOT to use it

Hard reasoning, complex code generation — use glm-4.7, claude-sonnet, or gpt-5.
English-dominant high-stakes content where the flagship reads better.
Strictly latency-critical paths where a 3rd-party failover may add a tail-latency tick.