May 20, 2026·Launch·中文版本 →·Русская версия →

Zhipu Web Search + GLM-4.5-Air Now Live on TheRouter

Five Zhipu BigModel routes just went live behind TheRouter — four citation-aware web-search engines and the budget-friendly zhipu/glm-4.5-air chat model. All of them are OpenAI-compatible, so a single endpoint change is usually enough to adopt them.

What launched

Four per-request search engines — each returns up to ten ranked web pages with title, URL, and content snippet, wrapped as an OpenAI chat.completion with url_citation annotations:

Model	Engine	Best for	Price
zhipu/search-std	BigModel general	Cheapest grounding	$0.0036/req
zhipu/search-pro	BigModel flagship	Richer snippets, filters	$0.0108/req
zhipu/search-pro-sogou	Sogou index	Chinese news, WeChat, Baike	$0.0168/req
zhipu/search-pro-quark	Quark (Alibaba)	Commerce, lifestyle, education	$0.0168/req

Plus a re-enabled direct route for zhipu/glm-4.5-air — Zhipu's cost-optimised GLM chat model at $0.15 input / $1.20 output per MTok, with siliconflow as primary and zhipu-cn as transparent fallback.

Why per-request search matters

Most retrieval-augmented stacks today either pay per token for a tool-calling agent that loops, or pay subscription rates to a SaaS search API. Per-request pricing — $0.0036 to $0.0168 per call — collapses both into a single chat-completion shape. You make one HTTP call, get ten citations back, and bill predictably.

Because the response carries url_citation annotations with byte-accurate start_index / end_index indices into the rendered markdown body, downstream code that already handles OpenAI tool output (or web- search citations in models like gpt-5 and claude-opus-4.7) works without changes.

The four engines, when to pick which

search-std — default for low-budget grounding. Identical envelope to the others, just cheapest.
search-pro — production grounding for RAG and answer engines. Richer snippets, supports search_recency_filter, search_domain_filter, and content_size: high.
search-pro-sogou — best mainland-China coverage. WeChat articles, Baike, regulatory and news sources Sogou is strong on.
search-pro-quark — Alibaba's Quark index. Different ranking than Sogou; strong on commerce, lifestyle, health, education.

The response envelope

Search results arrive as a single chat.completion with no token usage and a per-request counter:

{
  "id": "20260520152358c6c4c87c07854a05",
  "object": "chat.completion",
  "created": 1779261839,
  "model": "zhipu/search-pro",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "- [Smith report](https://example.com/smith) — Findings published 2026.\n- [Jones analysis](https://example.com/jones) — Follow-up coverage.",
      "annotations": [
        {
          "type": "url_citation",
          "url_citation": {
            "url": "https://example.com/smith",
            "title": "Smith report",
            "content": "Findings published 2026.",
            "start_index": 0,
            "end_index": 70
          }
        }
      ]
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0,
    "web_search_requests": 1
  }
}

message.content is a markdown bulleted list, one line per result. Each line is also exposed as a separate url_citation annotation with the byte range. Streaming requests get one chunk + a [DONE] terminator (BigModel doesn't stream partial results, so we don't fake it).

glm-4.5-air reroute

Existing customers of zhipu/glm-4.5-air don't have to change anything. We raised the customer-facing price to $0.15/$1.20 per MTok (input/output) to keep margin sane, and added a priority-1 fallback via BigModel direct (zhipu-cn). The primary route stays siliconflow-intl — failover is transparent and automatic.

Getting started

One curl call against TheRouter's OpenAI-compatible endpoint:

curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THE_ROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zhipu/search-pro",
    "messages": [{"role": "user", "content": "GLM-5 model release date"}]
  }'

For deeper recipes:

Web-search tutorial — cURL, Python, JS examples; filter parameters; engine comparison
glm-4.5-air tutorial — when to pick it, cost-comparison snippet, routing layout

Or browse the model detail pages: