Use Zhipu BigModel's search engines from any OpenAI-compatible client. One POST /v1/chat/completions, ten ranked results with url_citation annotations, billed per request.
| Model | Engine | Best for | Price |
|---|---|---|---|
zhipu/search-std | BigModel general | Cheapest grounding | $0.0036/req |
zhipu/search-pro | BigModel flagship | Richer snippets, filters | $0.0108/req |
zhipu/search-pro-sogou | Sogou index | Chinese news, WeChat, Baike | $0.0168/req |
zhipu/search-pro-quark | Quark (Alibaba) | Commerce, lifestyle, education | $0.0168/req |
curl https://api.therouter.ai/v1/chat/completions \
-H "Authorization: Bearer $THE_ROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zhipu/search-pro",
"messages": [{"role": "user", "content": "GLM-5 model release date"}]
}'from openai import OpenAI
client = OpenAI(
base_url="https://api.therouter.ai/v1",
api_key="$THE_ROUTER_API_KEY",
)
resp = client.chat.completions.create(
model="zhipu/search-pro",
messages=[{"role": "user", "content": "GLM-5 model release date"}],
)
# Markdown bulleted body with one line per result
print(resp.choices[0].message.content)
# Per-result citations with byte offsets into the markdown body
for ann in resp.choices[0].message.annotations or []:
cite = ann["url_citation"]
print(cite["title"], "->", cite["url"])
print("billed web_search_requests:", resp.usage.web_search_requests)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.therouter.ai/v1",
apiKey: process.env.THE_ROUTER_API_KEY,
});
const resp = await client.chat.completions.create({
model: "zhipu/search-pro",
messages: [{ role: "user", content: "GLM-5 model release date" }],
});
console.log(resp.choices[0].message.content);
const annotations = (resp.choices[0].message as any).annotations ?? [];
for (const ann of annotations) {
console.log(ann.url_citation.title, "->", ann.url_citation.url);
}
console.log("billed:", (resp.usage as any).web_search_requests);Search results arrive as a single chat.completion with zero token usage and a per-request counter. message.content is markdown; each line maps to a url_citation annotation with byte-accurate offsets:
{
"id": "20260520...",
"object": "chat.completion",
"model": "zhipu/search-pro",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "- [Smith report](https://example.com/smith) β Findings 2026.",
"annotations": [
{
"type": "url_citation",
"url_citation": {
"url": "https://example.com/smith",
"title": "Smith report",
"content": "Findings 2026.",
"start_index": 0,
"end_index": 56
}
}
]
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
"web_search_requests": 1
}
}Streaming requests get exactly one SSE chunk with the full body, then data: [DONE] β BigModel doesn't stream partial results, and we don't synthesize them.
All four search engines accept three optional filter fields:
search_recency_filter β one of oneDay, oneWeek, oneMonth, oneYear, noLimit. Restricts results to pages indexed within the window.search_domain_filter β a single domain glob (e.g. "example.com"). Arrays are accepted but only the first element is forwarded.content_size β medium (default) or high.high returns longer snippet bodies per result.curl https://api.therouter.ai/v1/chat/completions \
-H "Authorization: Bearer $THE_ROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zhipu/search-pro",
"messages": [{"role": "user", "content": "AI safety regulation"}],
"search_recency_filter": "oneWeek",
"search_domain_filter": "wired.com",
"content_size": "high"
}'search-std / search-pro.search-pro-sogou.search-pro-quark.search-pro-sogou and search-pro-quark in parallel and merge results.All four are billed per request β no token charges. The successful response carries usage.web_search_requests: 1, which the meter multiplies by the customer-facing per-request price. Failed upstream calls don't bill (401 / 429 / 5xx all throw typed errors before any counter is incremented).