Zhipu Web Search Tutorial

Use Zhipu BigModel's search engines from any OpenAI-compatible client. One POST /v1/chat/completions, ten ranked results with url_citation annotations, billed per request.

The four engines

ModelEngineBest forPrice
zhipu/search-stdBigModel generalCheapest grounding$0.0036/req
zhipu/search-proBigModel flagshipRicher snippets, filters$0.0108/req
zhipu/search-pro-sogouSogou indexChinese news, WeChat, Baike$0.0168/req
zhipu/search-pro-quarkQuark (Alibaba)Commerce, lifestyle, education$0.0168/req

Quickstart β€” cURL

curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THE_ROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zhipu/search-pro",
    "messages": [{"role": "user", "content": "GLM-5 model release date"}]
  }'

Python β€” OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.therouter.ai/v1",
    api_key="$THE_ROUTER_API_KEY",
)

resp = client.chat.completions.create(
    model="zhipu/search-pro",
    messages=[{"role": "user", "content": "GLM-5 model release date"}],
)

# Markdown bulleted body with one line per result
print(resp.choices[0].message.content)

# Per-result citations with byte offsets into the markdown body
for ann in resp.choices[0].message.annotations or []:
    cite = ann["url_citation"]
    print(cite["title"], "->", cite["url"])

print("billed web_search_requests:", resp.usage.web_search_requests)

JavaScript / TypeScript β€” OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.therouter.ai/v1",
  apiKey: process.env.THE_ROUTER_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "zhipu/search-pro",
  messages: [{ role: "user", content: "GLM-5 model release date" }],
});

console.log(resp.choices[0].message.content);

const annotations = (resp.choices[0].message as any).annotations ?? [];
for (const ann of annotations) {
  console.log(ann.url_citation.title, "->", ann.url_citation.url);
}

console.log("billed:", (resp.usage as any).web_search_requests);

The response envelope

Search results arrive as a single chat.completion with zero token usage and a per-request counter. message.content is markdown; each line maps to a url_citation annotation with byte-accurate offsets:

{
  "id": "20260520...",
  "object": "chat.completion",
  "model": "zhipu/search-pro",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "- [Smith report](https://example.com/smith) β€” Findings 2026.",
      "annotations": [
        {
          "type": "url_citation",
          "url_citation": {
            "url": "https://example.com/smith",
            "title": "Smith report",
            "content": "Findings 2026.",
            "start_index": 0,
            "end_index": 56
          }
        }
      ]
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0,
    "web_search_requests": 1
  }
}

Streaming requests get exactly one SSE chunk with the full body, then data: [DONE] β€” BigModel doesn't stream partial results, and we don't synthesize them.

Filtering results

All four search engines accept three optional filter fields:

curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THE_ROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zhipu/search-pro",
    "messages": [{"role": "user", "content": "AI safety regulation"}],
    "search_recency_filter": "oneWeek",
    "search_domain_filter": "wired.com",
    "content_size": "high"
  }'

Choosing the right engine

Pricing notes

All four are billed per request β€” no token charges. The successful response carries usage.web_search_requests: 1, which the meter multiplies by the customer-facing per-request price. Failed upstream calls don't bill (401 / 429 / 5xx all throw typed errors before any counter is incremented).

Cross-references