DeepSeek V4 Now Available on TheRouter — Direct API Integration
DeepSeek released V4 Flash and V4 Pro today — their most powerful open-source models to date. Both are already live on TheRouter with day-one support.
TheRouter adds day-one support for DeepSeek V4 Flash and V4 Pro via direct DeepSeek API integration. V4 Flash: 284B MoE with 13B active parameters, 1M context, 384K max output, $0.14/$0.28 per MTok. V4 Pro: 1.6T MoE with 49B active parameters, 1M context, 384K max output, $1.74/$3.48 per MTok. Both models feature Hybrid Attention Architecture and Engram conditional memory. Apache 2.0 licensed with weights on Hugging Face. Model IDs: deepseek/deepseek-v4-flash, deepseek/deepseek-v4-pro.
V4 Flash — Best Value for Everyday Tasks
- 284B MoE, 13B active — Mixture of Experts architecture with only 13B parameters active per forward pass, keeping inference fast and cost low.
- 1M context, 384K max output — process entire codebases or long documents in a single request with massive output capacity.
- Default thinking mode — built-in chain-of-thought reasoning enabled by default for better accuracy.
- $0.14 / $0.28 per MTok (input/output) — among the most affordable reasoning models available.
V4 Pro — Complex Reasoning Powerhouse
- 1.6T MoE, 49B active — the largest open-source MoE model, approaching Claude Opus 4.6 non-thinking level performance.
- 1M context, 384K max output — same generous context and output limits as V4 Flash.
- $1.74 / $3.48 per MTok (input/output) — competitive pricing for a model at this capability level.
Benchmarks
| Benchmark | V4 Pro | V4 Flash | Claude Opus 4.6 |
|---|---|---|---|
| SWE-bench Verified | 80.6% | 79.0% | 80.8% |
| LiveCodeBench | 93.5 | — | — |
| Codeforces Rating | 3206 | — | — |
V4 Pro leads on LiveCodeBench (93.5) and achieves the highest Codeforces rating (3206) among all models. On SWE-bench Verified, it matches Claude Opus 4.6 within 0.2%.
Architecture
- Hybrid Attention Architecture — combines efficient attention mechanisms for handling both short and ultra-long sequences.
- Engram conditional memory — enables efficient processing of 1M context windows without proportional compute scaling.
- MoE with low active params — keeps inference costs dramatically lower than dense models of equivalent total parameter count.
Pricing
| Model | Input | Output | Context |
|---|---|---|---|
| V4 Flash | $0.14/MTok | $0.28/MTok | 1M |
| V4 Pro | $1.74/MTok | $3.48/MTok | 1M |
V4 Flash is one of the most cost-effective reasoning models available. V4 Pro offers frontier-level coding at a fraction of closed-source pricing.
How to Use It
Use the standard model names — TheRouter handles routing automatically:
curl https://api.therouter.ai/v1/chat/completions \
-H "Authorization: Bearer $THE_ROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-v4-flash",
"messages": [{"role": "user", "content": "Explain the MoE architecture"}],
"max_tokens": 4096
}'For V4 Pro, use deepseek/deepseek-v4-pro. Both models are available on the Global endpoint (api.therouter.ai). The legacy China endpoint is retired; use the global endpoint for new integrations.
Open Source
Both V4 Flash and V4 Pro are released under the Apache 2.0 license with full model weights available on Hugging Face. You can self-host, fine-tune, or use them commercially without restrictions.
Getting Started
Already on TheRouter? Just set the model to deepseek/deepseek-v4-flash or deepseek/deepseek-v4-pro — no other changes needed.
Questions? Reach out on GitHub.