Prompt Caching
Cache long static context and pay less per request
Prompt caching is most effective when you keep a stable prefix (system prompt, policy text, retrieved context) and move dynamic user input to the end.
Reference payload
Use this baseline request shape and adapt model, provider sort strategy, and token limits to your workload.
request.json
{
"messages": [
{
"role": "system",
"content": [
{"type": "text", "text": "long policy text"},
{"type": "text", "text": "long KB chunk", "cache_control": {"type": "ephemeral"}}
]
},
{"role": "user", "content": "answer with citations"}
]
}Configuration examples
TheRouter.ai keeps request semantics consistent across providers, so you can tune behavior without rewriting your app layer.
TypeScript
const payload = {
model: "anthropic/claude-sonnet-4.5",
messages: [
{
role: "system",
content: [
{ type: "text", text: kbChunk },
{ type: "text", text: policies, cache_control: { type: "ephemeral", ttl: "1h" } },
],
},
{ role: "user", content: userPrompt },
],
};Production note
Operate with guardrails
Cache writes can be more expensive on some providers than cache reads. Measure hit rates before rolling out globally.
Use the activity feed and usage exports to validate that these settings improve reliability and cost in your real traffic mix.