Zhipu Web Search + GLM-4.5-Air Now Live on TheRouter
Five Zhipu BigModel routes just went live behind TheRouter — four citation-aware web-search engines and the budget-friendly zhipu/glm-4.5-air chat model. All of them are OpenAI-compatible, so a single endpoint change is usually enough to adopt them.
What launched
Four per-request search engines — each returns up to ten ranked web pages with title, URL, and content snippet, wrapped as an OpenAI chat.completion with url_citation annotations:
| Model | Engine | Best for | Price |
|---|---|---|---|
| zhipu/search-std | BigModel general | Cheapest grounding | $0.0036/req |
| zhipu/search-pro | BigModel flagship | Richer snippets, filters | $0.0108/req |
| zhipu/search-pro-sogou | Sogou index | Chinese news, WeChat, Baike | $0.0168/req |
| zhipu/search-pro-quark | Quark (Alibaba) | Commerce, lifestyle, education | $0.0168/req |
Plus a re-enabled direct route for zhipu/glm-4.5-air — Zhipu's cost-optimised GLM chat model at $0.15 input / $1.20 output per MTok, with siliconflow as primary and zhipu-cn as transparent fallback.
Why per-request search matters
Most retrieval-augmented stacks today either pay per token for a tool-calling agent that loops, or pay subscription rates to a SaaS search API. Per-request pricing — $0.0036 to $0.0168 per call — collapses both into a single chat-completion shape. You make one HTTP call, get ten citations back, and bill predictably.
Because the response carries url_citation annotations with byte-accurate start_index / end_index indices into the rendered markdown body, downstream code that already handles OpenAI tool output (or web- search citations in models like gpt-5 and claude-opus-4.7) works without changes.
The four engines, when to pick which
- search-std — default for low-budget grounding. Identical envelope to the others, just cheapest.
- search-pro — production grounding for RAG and answer engines. Richer snippets, supports
search_recency_filter,search_domain_filter, andcontent_size: high. - search-pro-sogou — best mainland-China coverage. WeChat articles, Baike, regulatory and news sources Sogou is strong on.
- search-pro-quark — Alibaba's Quark index. Different ranking than Sogou; strong on commerce, lifestyle, health, education.
The response envelope
Search results arrive as a single chat.completion with no token usage and a per-request counter:
{
"id": "20260520152358c6c4c87c07854a05",
"object": "chat.completion",
"created": 1779261839,
"model": "zhipu/search-pro",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "- [Smith report](https://example.com/smith) — Findings published 2026.\n- [Jones analysis](https://example.com/jones) — Follow-up coverage.",
"annotations": [
{
"type": "url_citation",
"url_citation": {
"url": "https://example.com/smith",
"title": "Smith report",
"content": "Findings published 2026.",
"start_index": 0,
"end_index": 70
}
}
]
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
"web_search_requests": 1
}
}message.content is a markdown bulleted list, one line per result. Each line is also exposed as a separate url_citation annotation with the byte range. Streaming requests get one chunk + a [DONE] terminator (BigModel doesn't stream partial results, so we don't fake it).
glm-4.5-air reroute
Existing customers of zhipu/glm-4.5-air don't have to change anything. We raised the customer-facing price to $0.15/$1.20 per MTok (input/output) to keep margin sane, and added a priority-1 fallback via BigModel direct (zhipu-cn). The primary route stays siliconflow-intl — failover is transparent and automatic.
Getting started
One curl call against TheRouter's OpenAI-compatible endpoint:
curl https://api.therouter.ai/v1/chat/completions \
-H "Authorization: Bearer $THE_ROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zhipu/search-pro",
"messages": [{"role": "user", "content": "GLM-5 model release date"}]
}'For deeper recipes:
- Web-search tutorial — cURL, Python, JS examples; filter parameters; engine comparison
- glm-4.5-air tutorial — when to pick it, cost-comparison snippet, routing layout
Or browse the model detail pages: