zhipu/glm-4.5-air Tutorial

zhipu/glm-4.5-air is the lightweight member of Zhipu's GLM-4.5 family β€” strong reasoning and tool use at $0.15 input / $1.20 output per million tokens. TheRouter routes it primarily via siliconflow-intl with transparent failover to BigModel direct (zhipu-cn).

Quickstart β€” cURL

curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THE_ROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zhipu/glm-4.5-air",
    "messages": [{"role": "user", "content": "Summarise the GLM-5 launch in 3 bullets"}],
    "max_tokens": 256
  }'

Python β€” OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.therouter.ai/v1",
    api_key="$THE_ROUTER_API_KEY",
)

resp = client.chat.completions.create(
    model="zhipu/glm-4.5-air",
    messages=[
        {"role": "system", "content": "You are a concise technical writer."},
        {"role": "user", "content": "Summarise the GLM-5 launch in 3 bullets"},
    ],
    max_tokens=256,
)
print(resp.choices[0].message.content)

JavaScript / TypeScript β€” OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.therouter.ai/v1",
  apiKey: process.env.THE_ROUTER_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "zhipu/glm-4.5-air",
  messages: [{ role: "user", content: "Hello in 3 words" }],
  max_tokens: 32,
});
console.log(resp.choices[0].message.content);

Capabilities

Routing layout

Customers hit POST /v1/chat/completions with model: "zhipu/glm-4.5-air". TheRouter resolves this to:

Failover is silent β€” no client action required, no observable behavior change beyond a small latency tick during a switch event. You don't need to enable a flag.

Cost comparison

Approximate per-million-token cost against the GLM family flagship at TheRouter published rates:

ModelInput ($/MTok)Output ($/MTok)Use case
zhipu/glm-4.5-air0.151.20High-volume chat, summarisation, agent loops
zhipu/glm-4.70.502.33Hard reasoning, complex code

For a 1K-input / 256-output chat turn, glm-4.5-air bills β‰ˆ $0.00046 vs β‰ˆ $0.00110 for glm-4.7 β€” roughly 2.4Γ— cheaper. Pick the air variant for customer-support replies, content rewriting, summarisation; pick glm-4.7 when reasoning or code-generation quality is the bottleneck.

When to use it

When NOT to use it

Cross-references