DeepSeek Retires deepseek-chat and deepseek-reasoner: What the V4 Model Rename Means for Routing Teams

When a major API provider retires model names, every team that hard-codes a model string in a routing config, fallback chain, or SDK call needs to act before the deadline. DeepSeek has confirmed that deepseek-chat and deepseek-reasoner will stop working on July 24, 2026, and the replacement names come with a richer feature set, a split pricing tier, and — for the first time — an Anthropic-compatible SDK endpoint.

What changed

DeepSeek's current model lineup is now:

| Old name | Deprecated | New name | Mode | |---|---|---|---| | deepseek-chat | July 24, 2026 | deepseek-v4-flash | Non-thinking (default) | | deepseek-reasoner | July 24, 2026 | deepseek-v4-flash (thinking mode) | Thinking enabled | | — | — | deepseek-v4-pro | Both modes, 1M context |

Both new models default to thinking mode enabled, which matters for latency and cost budgets. Teams currently on deepseek-chat expecting fast non-thinking responses need to explicitly disable thinking when migrating to deepseek-v4-flash.

There are also two new API base URL options:

OpenAI-format: https://api.deepseek.com (existing)
Anthropic-format: https://api.deepseek.com/anthropic (new)

The Anthropic-format endpoint means tools that speak the Anthropic Messages API — Claude Code, OpenCode, or any Anthropic SDK client — can now route directly to DeepSeek V4 with no middleware rewriting.

Why it matters for AI engineering teams

Deadline risk is real. July 24 is less than 65 days away. Any hard-coded deepseek-chat or deepseek-reasoner reference in a router config, provider preset, environment variable, or CI harness will break on that date unless updated.

The thinking-mode default is a behavior change. deepseek-v4-flash defaults to thinking on, unlike the old deepseek-chat which was non-thinking by default. A direct model-name swap without adjusting thinking parameter will change response latency, token counts, and billed output. Teams with tight latency SLAs or token-budget cost controls need to add "thinking": {"type": "disabled"} — or audit whether thinking-on is actually what they want (it often is for coding and reasoning workloads).

Two pricing tiers, one deadline. deepseek-v4-flash runs $0.14/M input (cache miss) and $0.28/M output — similar to the old deepseek-chat rate. deepseek-v4-pro runs $1.74/M input and $3.48/M output at list price, but is currently discounted 75% through May 31, 2026 ($0.435/M input, $0.87/M output). Teams evaluating deepseek-v4-pro for reasoning-heavy workloads have a narrow window to run cost benchmarks at discounted rates.

Anthropic endpoint unlocks Claude Code routing. The new https://api.deepseek.com/anthropic endpoint exposes the Anthropic Messages API. Claude Code and OpenCode can use DeepSeek as their backing model by setting ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN — no OpenAI-format proxy layer required. The recommended config maps Opus/Sonnet to deepseek-v4-pro[1m] (for complex tasks with full 1M context) and Haiku/subagent to deepseek-v4-flash (fast execution). This is DeepSeek's first first-class Anthropic SDK compatibility story.

The router/operator angle

For teams routing through an AI gateway or managing multiple upstream providers, the V4 rename introduces three decision points:

1. Model alias audit. Any router that resolves deepseek-chat → DeepSeek's API needs to either update the model string to deepseek-v4-flash or maintain an alias mapping. The old names will still resolve until July 24 — after that, 400 errors or silent failures depending on provider error handling.

2. Thinking-mode flag discipline. The shift in thinking-mode default creates a subtle breaking change even if your router passes model names through correctly. If your gateway does not inject or strip the thinking parameter, you may end up changing model behavior in production without a config change. The safest migration path: explicitly set "thinking": {"type": "enabled"} or "disabled"} in your router's DeepSeek provider config rather than relying on the default.

3. Dual-endpoint complexity. Teams with Claude Code or Anthropic SDK clients now have a choice: route through an OpenAI-format gateway (which handles Anthropic→OpenAI translation) or point directly at DeepSeek's Anthropic endpoint. The direct path reduces a translation hop and preserves Anthropic-specific parameters like reasoning_effort. The gateway path keeps all providers in one routing table and usage ledger.

Routing policy checklist before July 24:

[ ] Audit all provider configs, env vars, and presets for deepseek-chat / deepseek-reasoner
[ ] Replace with deepseek-v4-flash (fast/economy) or deepseek-v4-pro (complex reasoning/1M context)
[ ] Add explicit thinking parameter to avoid default behavior change
[ ] Update fallback chains that name DeepSeek models explicitly
[ ] Decide: OpenAI-format gateway or direct Anthropic endpoint for Claude Code workloads
[ ] Lock in deepseek-v4-pro cost benchmarks before May 31 discount expires

What TheRouter users should watch or try

If you use TheRouter as your OpenAI-compatible routing gateway, the DeepSeek V4 migration is a provider-config update: replace deepseek-chat with deepseek-v4-flash or deepseek-v4-pro in your provider settings and add explicit thinking-mode parameters.

For Claude Code or OpenCode users routing through TheRouter today, you have two paths: continue using TheRouter's OpenAI-format routing (which forwards requests to DeepSeek's OpenAI endpoint) or experiment with pointing ANTHROPIC_BASE_URL directly at https://api.deepseek.com/anthropic for a direct Anthropic-format connection. The direct path requires no proxy but loses centralized routing, billing reconciliation, and fallback.

See the SiliconFlow provider guide for an example of configuring a Chinese AI provider through TheRouter's routing layer — the same pattern applies to DeepSeek.

What changed

Why it matters for AI engineering teams

The router/operator angle

What TheRouter users should watch or try

Похожие материалы

DeepSeek Now Speaks Anthropic: What the New Dual-Format API Means for Your Routing Layer

DeepSeek's Official Coding Agent Guide: Route Claude Code and OpenCode to V4 Models

Qwen-Image on DashScope: What the New Image Generation and Editing APIs Mean for Your Async Media Pipeline