Zhipu Web Search Tutorial

Use Zhipu BigModel's search engines from any OpenAI-compatible client. One POST /v1/chat/completions, ten ranked results with url_citation annotations, billed per request.

The four engines

Model	Engine	Best for	Price
`zhipu/search-std`	BigModel general	Cheapest grounding	`$0.0036/req`
`zhipu/search-pro`	BigModel flagship	Richer snippets, filters	`$0.0108/req`
`zhipu/search-pro-sogou`	Sogou index	Chinese news, WeChat, Baike	`$0.0168/req`
`zhipu/search-pro-quark`	Quark (Alibaba)	Commerce, lifestyle, education	`$0.0168/req`

Quickstart — cURL

curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THE_ROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zhipu/search-pro",
    "messages": [{"role": "user", "content": "GLM-5 model release date"}]
  }'

Python — OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.therouter.ai/v1",
    api_key="$THE_ROUTER_API_KEY",
)

resp = client.chat.completions.create(
    model="zhipu/search-pro",
    messages=[{"role": "user", "content": "GLM-5 model release date"}],
)

# Markdown bulleted body with one line per result
print(resp.choices[0].message.content)

# Per-result citations with byte offsets into the markdown body
for ann in resp.choices[0].message.annotations or []:
    cite = ann["url_citation"]
    print(cite["title"], "->", cite["url"])

print("billed web_search_requests:", resp.usage.web_search_requests)

JavaScript / TypeScript — OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.therouter.ai/v1",
  apiKey: process.env.THE_ROUTER_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "zhipu/search-pro",
  messages: [{ role: "user", content: "GLM-5 model release date" }],
});

console.log(resp.choices[0].message.content);

const annotations = (resp.choices[0].message as any).annotations ?? [];
for (const ann of annotations) {
  console.log(ann.url_citation.title, "->", ann.url_citation.url);
}

console.log("billed:", (resp.usage as any).web_search_requests);

The response envelope

Search results arrive as a single chat.completion with zero token usage and a per-request counter. message.content is markdown; each line maps to a url_citation annotation with byte-accurate offsets:

{
  "id": "20260520...",
  "object": "chat.completion",
  "model": "zhipu/search-pro",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "- [Smith report](https://example.com/smith) — Findings 2026.",
      "annotations": [
        {
          "type": "url_citation",
          "url_citation": {
            "url": "https://example.com/smith",
            "title": "Smith report",
            "content": "Findings 2026.",
            "start_index": 0,
            "end_index": 56
          }
        }
      ]
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0,
    "web_search_requests": 1
  }
}

Streaming requests get exactly one SSE chunk with the full body, then data: [DONE] — BigModel doesn't stream partial results, and we don't synthesize them.

Filtering results

All four search engines accept three optional filter fields:

search_recency_filter — one of oneDay, oneWeek, oneMonth, oneYear, noLimit. Restricts results to pages indexed within the window.
search_domain_filter — a single domain glob (e.g. "example.com"). Arrays are accepted but only the first element is forwarded.
content_size — medium (default) or high.high returns longer snippet bodies per result.

curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THE_ROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zhipu/search-pro",
    "messages": [{"role": "user", "content": "AI safety regulation"}],
    "search_recency_filter": "oneWeek",
    "search_domain_filter": "wired.com",
    "content_size": "high"
  }'

Choosing the right engine

English-dominant or general-knowledge queries → search-std / search-pro.
Chinese news, WeChat articles, regulatory or encyclopedic content → search-pro-sogou.
Chinese commerce, lifestyle, health, education → search-pro-quark.
Need broad recall? Call search-pro-sogou and search-pro-quark in parallel and merge results.

Pricing notes

All four are billed per request — no token charges. The successful response carries usage.web_search_requests: 1, which the meter multiplies by the customer-facing per-request price. Failed upstream calls don't bill (401 / 429 / 5xx all throw typed errors before any counter is incremented).