DeepSeek V3.2

deepseekdeepseek/deepseek-v3.2

DeepSeek V3.2 (685B total, 37B active MoE) harmonizes high computational efficiency with superior reasoning and agent performance. Features DeepSeek Sparse Attention for long-context efficiency and a scalable reinforcement learning framework. Excels at long-context reasoning, tool-using agents, function calling, JSON output, and FIM.

Hugging Face model card ↗DeepSeek-V3.2 technical report (arXiv) ↗DeepSeek-V3.2-Exp announcement ↗GitHub — DeepSeek-V3.2-Exp ↗

DeepSeek-V3.2 is the open-weight evolution of the DeepSeek-V3 family, released in two phases: V3.2-Exp in September 2025 (experimental), followed by the full V3.2 with a technical report on arXiv 2512.02556 dated December 2, 2025. It keeps the 671B-total / 37B-active mixture-of-experts backbone but introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that cuts long-context inference cost while keeping benchmark parity with V3.1-Terminus.

For a TheRouter operator, V3.2 is the price-per-quality champion in the open-weight tier. DeepSeek reports the DSA architecture enabled an over-50% API cost reduction at the time of the V3.2-Exp launch versus V3.1. The model is published under the MIT license for code with a permissive Model License for the weights — both allow commercial use. Inference runs on SGLang, vLLM, LMDeploy, TensorRT-LLM, LightLLM, and DeepSeek-Infer in FP8 or BF16 across NVIDIA and AMD GPUs, plus Huawei Ascend NPUs via MindIE — meaning a self-host fallback path is genuinely realistic, not just nominal.

Best for

• Cost-sensitive reasoning workloads — DeepSeek V3.2 lands within range of frontier closed models on MMLU, MATH, and HumanEval at a fraction of the price
• Long-context tasks where DSA's lower attention cost actually matters — repo-wide analysis, long-form summarisation, chain-of-thought over big documents
• Self-host alongside hosted use — same weights, same tokenizer, smooth fallback for outage drills or compliance constraints
• Tool-using agents that need JSON / function-calling reliability without paying flagship prices

Reach for something else if

• Native vision or multimodal input — V3.2 is text-in / text-out; for image inputs route to Claude Opus 4.7 or Amazon Nova 2 Lite
• Deep reasoning beyond standard chat — for that DeepSeek ships V3.2-Speciale (no tool calls, higher token usage); not always the right routing target
• Critical production paths where V4 is already viable — DeepSeek released V4 in April 2026 with a 1M context window and Compressed Sparse Attention; new projects should evaluate V4 first

Context Length

128K

Max Output

33K

Input Priceper 1M tokens

$0.960/ 1M tokens

Output Priceper 1M tokens

$2.88/ 1M tokens

Modalities

text→text

Pricing Breakdown

Type	Rate
Input	$0.960 / 1M tokens
Output	$2.88 / 1M tokens

Supported Parameters

temperaturemax_tokenstop_ptoolstool_choiceresponse_formatstop

Specifications

Release date (V3.2 full)	2025-12-02 (arXiv tech report)arxiv.org ↗	verified
V3.2-Exp release	September 2025 (experimental, DSA debut)api-docs.deepseek.com ↗	verified
Architecture	671B-total / 37B-active MoE; MLA (Multi-Head Latent Attention) + DSA (DeepSeek Sparse Attention); RoPE; multi-token prediction training objectivearxiv.org ↗	verified
Pretraining tokens (V3 backbone)	14.8 trilliongithub.com ↗	verified
Training cutoff	Not publicly disclosed	unknown
License — code	MITgithub.com ↗	verified
License — weights	DeepSeek Model License (commercial use permitted)github.com ↗	verified
Supported inference backends	SGLang, vLLM v0.6.6+, LMDeploy, TensorRT-LLM, LightLLM, DeepSeek-Infer Demo; FP8 + BF16; NVIDIA + AMD GPUs; Huawei Ascend NPU via MindIEgithub.com ↗	verified
Successor	DeepSeek-V4 (April 24, 2026) — 1M context, Compressed Sparse Attention (CSA), native agentic tooling	verified

Benchmarks

Full benchmarks page →

Benchmark	Score	Source
MMLU (EM, Chat) From the V3 README — V3.2 benchmarks reported as on par with V3.1-Terminus, so this V3-baseline carries over within noise.	88.5	github.com ↗
HumanEval-Mul (Pass@1, Chat)	82.6	github.com ↗
MATH-500 (EM, Chat)	90.2	github.com ↗
GSM8K (8-shot EM, Base)	89.3	github.com ↗
GPQA-Diamond (Pass@1, Chat)	59.1	github.com ↗

API Usage Examples

Use the global api.therouter.ai endpoint shown below for new integrations; the legacy China accelerated endpoint is retired.

cURL

curl https://api.therouter.ai/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer $THE_ROUTER_API_KEY"   -d '{
    "model": "deepseek/deepseek-v3.2",
    "messages": [
      {"role": "user", "content": "Summarize the key points from this input."}
    ]
  }'

API guide

Full API reference →

Chat completion

Standard chat through TheRouter's OpenAI-compatible surface. TheRouter normalises tool-calling and response_format on top of the underlying provider — your client code stays portable across DeepSeek, Anthropic, and OpenAI.

cURL

curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THEROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v3.2",
    "messages": [{"role": "user", "content": "Prove that the sum of two odd integers is even."}]
  }'

More from deepseek

deepseek/deepseek-v3.2-expExperimental V3.2 — same DSA architecture, training configs aligned with V3.1-Terminus to isolate DSA's effect. Useful for ablation work.deepseek/deepseek-v3.1-terminusPre-DSA V3.1 final checkpoint — the canonical baseline V3.2 is benchmarked against.deepseek/deepseek-r1DeepSeek's RL-trained reasoning model — different design point; route here for chain-of-thought heavy tasks instead of V3.2-Speciale.deepseek/deepseek-v4-flashSuccessor generation — 1M context, CSA attention, native agentic tooling at flash-tier pricing. Worth evaluating for new projects.

Similar models

Cross-provider sibling models

qwen/qwen3-235b

Open-weight peer in the same parameter class — useful as a cross-provider routing fallback.

anthropic/claude-sonnet-4.6

Closed-source mid-tier comparison — similar throughput targets, different cost profile.

openai/gpt-5

Closed-source flagship V3.2 explicitly benchmarks against in the tech report — Speciale variant surpasses GPT-5 on reasoning.

google/gemini-2.5-pro

Closed-source flagship with long-context strength — useful third corner of an open / closed / Google routing triangle.

News & changes

2025-12-02

DeepSeek publishes V3.2 technical report — same DSA architecture as V3.2-Exp, scaled post-training puts it on par with GPT-5

DeepSeek released the full V3.2 with arXiv 2512.02556. The paper confirms V3.2 uses exactly the same architecture as the September V3.2-Exp; the difference is the post-training compute budget, which is large enough to land V3.2 at GPT-5-comparable quality on the reported benchmark suite. A separate Speciale variant exceeds GPT-5 on reasoning, with gold-medal-level scores on IMO / IOI / ICPC / CMO 2025 — but it drops tool-calling and uses more tokens, so it's a research target rather than a general routing destination.

re-authored by TheRouterarxiv.org/abs/2512.02556 ↗

2025-09-29

DeepSeek launches V3.2-Exp — debuts Sparse Attention (DSA), API price drops over 50%

DeepSeek shipped V3.2-Exp as an experimental release to validate the DSA attention mechanism with a known training baseline (V3.1-Terminus configs). DSA is a fine-grained sparse attention built on MLA — a lightning indexer plus a per-query token selection mechanism — that cuts long-context attention cost without measurable quality loss on public benchmarks. DeepSeek used the resulting efficiency gain to lower its hosted-API prices by more than 50% at launch.

re-authored by TheRouterapi-docs.deepseek.com ↗

Frequently asked

What's the difference between V3.2 and V3.2-Exp?

Same architecture (the V3.2 paper says so explicitly). V3.2-Exp was released first in September 2025 as an experimental validation of DSA with training configs aligned to V3.1-Terminus, to isolate the attention mechanism's effect. The full V3.2 (December 2025) keeps the same model but adds scaled post-training compute, lifting it to GPT-5-comparable quality on the reported benchmark suite.

arxiv.org ↗

What is DeepSeek Sparse Attention (DSA)?

DSA is a fine-grained sparse attention mechanism built on top of MLA (Multi-Head Latent Attention). It has two parts: a lightning indexer that scores how relevant each preceding token is to the current query, and a fine-grained selection mechanism that decides which tokens the query actually attends to. The point is to cut the O(L²) cost of long-context attention without measurable quality loss on public benchmarks — DeepSeek used the resulting efficiency to drop API prices by over 50% at the V3.2-Exp launch.

arxiv.org ↗

Can I self-host DeepSeek V3.2?

Yes. Weights are published under the DeepSeek Model License (commercial use permitted) and code under MIT. Reference inference recipes ship for SGLang, vLLM, LMDeploy, TensorRT-LLM, LightLLM, and DeepSeek-Infer in FP8 and BF16 on NVIDIA and AMD GPUs. Huawei Ascend NPUs are supported via MindIE. For most teams the practical entry point is SGLang on FP8 with tensor parallelism sized to the model card recommendations.

github.com ↗

Should I use V3.2 or wait for V4?

DeepSeek-V4 launched on April 24, 2026 with a 1M context window, a new Compressed Sparse Attention (CSA) variant, and native agentic tooling — for new projects, evaluate V4 (v4-flash for cost, v4-pro for quality) before defaulting to V3.2. V3.2 remains the right choice when you need the open-weight self-host fallback or when V4's pricing isn't justified for the workload.

Does V3.2 accept image input?

No. V3.2 is text-in / text-out. For image input route to Claude Opus 4.7 or Amazon Nova 2 Lite; for image output route to a dedicated image-generation model.

Fact ledger — every claim on this page traces here

source	URL	retrieved
Release date (V3.2 full)	arxiv.org ↗	2026-05-22	verified
V3.2-Exp release	api-docs.deepseek.com ↗	2026-05-22	verified
Architecture	arxiv.org ↗	2026-05-22	verified
Pretraining tokens (V3 backbone)	github.com ↗	2026-05-22	verified
Training cutoff	—	—	unknown
License — code	github.com ↗	2026-05-22	verified
License — weights	github.com ↗	2026-05-22	verified
Supported inference backends	github.com ↗	2026-05-22	verified
Successor	—	—	verified
MMLU (EM, Chat)	github.com ↗	2026-05-22	to verify
HumanEval-Mul (Pass@1, Chat)	github.com ↗	2026-05-22	to verify
MATH-500 (EM, Chat)	github.com ↗	2026-05-22	to verify
GSM8K (8-shot EM, Base)	github.com ↗	2026-05-22	to verify
GPQA-Diamond (Pass@1, Chat)	github.com ↗	2026-05-22	to verify
LiveCodeBench (Pass@1-COT)	github.com ↗	2026-05-22	to verify
AIME 2024 (Pass@1)	github.com ↗	2026-05-22	to verify
GPT-5 comparison (qualitative)	arxiv.org ↗	2026-05-22	to verify
DeepSeek publishes V3.2 technical report — same DSA architecture as V3.2-Exp, scaled post-training puts it on par with GPT-5	arxiv.org/abs/2512.02556 ↗	2026-05-22	verified
DeepSeek launches V3.2-Exp — debuts Sparse Attention (DSA), API price drops over 50%	api-docs.deepseek.com ↗	2026-05-22	verified
What's the difference between V3.2 and V3.2-Exp?	arxiv.org ↗	2026-05-22	to verify
What is DeepSeek Sparse Attention (DSA)?	arxiv.org ↗	2026-05-22	to verify
Can I self-host DeepSeek V3.2?	github.com ↗	2026-05-22	to verify