Back to Models

DeepSeek V3.2

deepseekdeepseek/deepseek-v3.2

DeepSeek V3.2 (685B total, 37B active MoE) harmonizes high computational efficiency with superior reasoning and agent performance. Features DeepSeek Sparse Attention for long-context efficiency and a scalable reinforcement learning framework. Excels at long-context reasoning, tool-using agents, function calling, JSON output, and FIM.

DeepSeek-V3.2 is the open-weight evolution of the DeepSeek-V3 family, released in two phases: V3.2-Exp in September 2025 (experimental), followed by the full V3.2 with a technical report on arXiv 2512.02556 dated December 2, 2025. It keeps the 671B-total / 37B-active mixture-of-experts backbone but introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that cuts long-context inference cost while keeping benchmark parity with V3.1-Terminus.

For a TheRouter operator, V3.2 is the price-per-quality champion in the open-weight tier. DeepSeek reports the DSA architecture enabled an over-50% API cost reduction at the time of the V3.2-Exp launch versus V3.1. The model is published under the MIT license for code with a permissive Model License for the weights β€” both allow commercial use. Inference runs on SGLang, vLLM, LMDeploy, TensorRT-LLM, LightLLM, and DeepSeek-Infer in FP8 or BF16 across NVIDIA and AMD GPUs, plus Huawei Ascend NPUs via MindIE β€” meaning a self-host fallback path is genuinely realistic, not just nominal.

Best for
  • β€’ Cost-sensitive reasoning workloads β€” DeepSeek V3.2 lands within range of frontier closed models on MMLU, MATH, and HumanEval at a fraction of the price
  • β€’ Long-context tasks where DSA's lower attention cost actually matters β€” repo-wide analysis, long-form summarisation, chain-of-thought over big documents
  • β€’ Self-host alongside hosted use β€” same weights, same tokenizer, smooth fallback for outage drills or compliance constraints
  • β€’ Tool-using agents that need JSON / function-calling reliability without paying flagship prices
Reach for something else if
  • β€’ Native vision or multimodal input β€” V3.2 is text-in / text-out; for image inputs route to Claude Opus 4.7 or Amazon Nova 2 Lite
  • β€’ Deep reasoning beyond standard chat β€” for that DeepSeek ships V3.2-Speciale (no tool calls, higher token usage); not always the right routing target
  • β€’ Critical production paths where V4 is already viable β€” DeepSeek released V4 in April 2026 with a 1M context window and Compressed Sparse Attention; new projects should evaluate V4 first
Context Length
128K
Max Output
33K
Input Priceper 1M tokens
$0.960/ 1M tokens
Output Priceper 1M tokens
$2.88/ 1M tokens

Modalities

text→text

Pricing Breakdown

TypeRate
Input$0.960 / 1M tokens
Output$2.88 / 1M tokens

Supported Parameters

temperaturemax_tokenstop_ptoolstool_choiceresponse_formatstop

Specifications

Release date (V3.2 full)2025-12-02 (arXiv tech report)arxiv.org β†—verified
V3.2-Exp releaseSeptember 2025 (experimental, DSA debut)api-docs.deepseek.com β†—verified
Architecture671B-total / 37B-active MoE; MLA (Multi-Head Latent Attention) + DSA (DeepSeek Sparse Attention); RoPE; multi-token prediction training objectivearxiv.org β†—verified
Pretraining tokens (V3 backbone)14.8 trilliongithub.com β†—verified
Training cutoffNot publicly disclosedunknown
License β€” codeMITgithub.com β†—verified
License β€” weightsDeepSeek Model License (commercial use permitted)github.com β†—verified
Supported inference backendsSGLang, vLLM v0.6.6+, LMDeploy, TensorRT-LLM, LightLLM, DeepSeek-Infer Demo; FP8 + BF16; NVIDIA + AMD GPUs; Huawei Ascend NPU via MindIEgithub.com β†—verified
SuccessorDeepSeek-V4 (April 24, 2026) β€” 1M context, Compressed Sparse Attention (CSA), native agentic toolingverified
BenchmarkDistributionScoreSource
MMLU (EM, Chat)
From the V3 README β€” V3.2 benchmarks reported as on par with V3.1-Terminus, so this V3-baseline carries over within noise.
88.5github.com β†—
HumanEval-Mul (Pass@1, Chat)
82.6github.com β†—
MATH-500 (EM, Chat)
90.2github.com β†—
GSM8K (8-shot EM, Base)
89.3github.com β†—
GPQA-Diamond (Pass@1, Chat)
59.1github.com β†—

API Usage Examples

Use the global api.therouter.ai endpoint shown below for new integrations; the legacy China accelerated endpoint is retired.

cURL
curl https://api.therouter.ai/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer $THE_ROUTER_API_KEY"   -d '{
    "model": "deepseek/deepseek-v3.2",
    "messages": [
      {"role": "user", "content": "Summarize the key points from this input."}
    ]
  }'

Chat completion

Standard chat through TheRouter's OpenAI-compatible surface. TheRouter normalises tool-calling and response_format on top of the underlying provider β€” your client code stays portable across DeepSeek, Anthropic, and OpenAI.

cURL
curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THEROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v3.2",
    "messages": [{"role": "user", "content": "Prove that the sum of two odd integers is even."}]
  }'

More from deepseek

Similar models

Cross-provider sibling models

News & changes

2025-12-02

DeepSeek publishes V3.2 technical report β€” same DSA architecture as V3.2-Exp, scaled post-training puts it on par with GPT-5

DeepSeek released the full V3.2 with arXiv 2512.02556. The paper confirms V3.2 uses exactly the same architecture as the September V3.2-Exp; the difference is the post-training compute budget, which is large enough to land V3.2 at GPT-5-comparable quality on the reported benchmark suite. A separate Speciale variant exceeds GPT-5 on reasoning, with gold-medal-level scores on IMO / IOI / ICPC / CMO 2025 β€” but it drops tool-calling and uses more tokens, so it's a research target rather than a general routing destination.

re-authored by TheRouterarxiv.org/abs/2512.02556 β†—
2025-09-29

DeepSeek launches V3.2-Exp β€” debuts Sparse Attention (DSA), API price drops over 50%

DeepSeek shipped V3.2-Exp as an experimental release to validate the DSA attention mechanism with a known training baseline (V3.1-Terminus configs). DSA is a fine-grained sparse attention built on MLA β€” a lightning indexer plus a per-query token selection mechanism β€” that cuts long-context attention cost without measurable quality loss on public benchmarks. DeepSeek used the resulting efficiency gain to lower its hosted-API prices by more than 50% at launch.

re-authored by TheRouterapi-docs.deepseek.com β†—

Frequently asked

What's the difference between V3.2 and V3.2-Exp?

Same architecture (the V3.2 paper says so explicitly). V3.2-Exp was released first in September 2025 as an experimental validation of DSA with training configs aligned to V3.1-Terminus, to isolate the attention mechanism's effect. The full V3.2 (December 2025) keeps the same model but adds scaled post-training compute, lifting it to GPT-5-comparable quality on the reported benchmark suite.

What is DeepSeek Sparse Attention (DSA)?

DSA is a fine-grained sparse attention mechanism built on top of MLA (Multi-Head Latent Attention). It has two parts: a lightning indexer that scores how relevant each preceding token is to the current query, and a fine-grained selection mechanism that decides which tokens the query actually attends to. The point is to cut the O(LΒ²) cost of long-context attention without measurable quality loss on public benchmarks β€” DeepSeek used the resulting efficiency to drop API prices by over 50% at the V3.2-Exp launch.

Can I self-host DeepSeek V3.2?

Yes. Weights are published under the DeepSeek Model License (commercial use permitted) and code under MIT. Reference inference recipes ship for SGLang, vLLM, LMDeploy, TensorRT-LLM, LightLLM, and DeepSeek-Infer in FP8 and BF16 on NVIDIA and AMD GPUs. Huawei Ascend NPUs are supported via MindIE. For most teams the practical entry point is SGLang on FP8 with tensor parallelism sized to the model card recommendations.

Should I use V3.2 or wait for V4?

DeepSeek-V4 launched on April 24, 2026 with a 1M context window, a new Compressed Sparse Attention (CSA) variant, and native agentic tooling β€” for new projects, evaluate V4 (v4-flash for cost, v4-pro for quality) before defaulting to V3.2. V3.2 remains the right choice when you need the open-weight self-host fallback or when V4's pricing isn't justified for the workload.

Does V3.2 accept image input?

No. V3.2 is text-in / text-out. For image input route to Claude Opus 4.7 or Amazon Nova 2 Lite; for image output route to a dedicated image-generation model.

Fact ledger β€” every claim on this page traces here
sourceURLretrieved
Release date (V3.2 full)arxiv.org β†—2026-05-22verified
V3.2-Exp releaseapi-docs.deepseek.com β†—2026-05-22verified
Architecturearxiv.org β†—2026-05-22verified
Pretraining tokens (V3 backbone)github.com β†—2026-05-22verified
Training cutoffβ€”β€”unknown
License β€” codegithub.com β†—2026-05-22verified
License β€” weightsgithub.com β†—2026-05-22verified
Supported inference backendsgithub.com β†—2026-05-22verified
Successorβ€”β€”verified
MMLU (EM, Chat)github.com β†—2026-05-22to verify
HumanEval-Mul (Pass@1, Chat)github.com β†—2026-05-22to verify
MATH-500 (EM, Chat)github.com β†—2026-05-22to verify
GSM8K (8-shot EM, Base)github.com β†—2026-05-22to verify
GPQA-Diamond (Pass@1, Chat)github.com β†—2026-05-22to verify
LiveCodeBench (Pass@1-COT)github.com β†—2026-05-22to verify
AIME 2024 (Pass@1)github.com β†—2026-05-22to verify
GPT-5 comparison (qualitative)arxiv.org β†—2026-05-22to verify
DeepSeek publishes V3.2 technical report β€” same DSA architecture as V3.2-Exp, scaled post-training puts it on par with GPT-5arxiv.org/abs/2512.02556 β†—2026-05-22verified
DeepSeek launches V3.2-Exp β€” debuts Sparse Attention (DSA), API price drops over 50%api-docs.deepseek.com β†—2026-05-22verified
What's the difference between V3.2 and V3.2-Exp?arxiv.org β†—2026-05-22to verify
What is DeepSeek Sparse Attention (DSA)?arxiv.org β†—2026-05-22to verify
Can I self-host DeepSeek V3.2?github.com β†—2026-05-22to verify
Customer Support