DeepSeekMath-V2

Name: DeepSeekMath-V2
Availability: InStock
Author: DeepSeek-AI

推理大模型

DeepSeekMath-V2

Release date: 2025-11-27更新于: 2026-03-08 21:11:18701

Live demoGitHub Hugging Face Compare

Parameters

6850.0亿

Context length

108K

Chinese support

Supported

Reasoning ability

DeepSeekMath-V2 is an AI model published by DeepSeek-AI, released on 2025-11-27, for 推理大模型, with 6850.0B parameters, and 108K tokens context length, under the Apache 2.0 license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

DeepSeekMath-V2

Model basics

Reasoning traces

Supported

Thinking modes

Thinking Level · Extended (Default)

Context length

108K tokens

Max output length

No data

Model type

推理大模型

Release date

2025-11-27

Model file size

No data

MoE architecture

Yes

Total params / Active params

6850.0B / 370B

Knowledge cutoff

No data

DeepSeekMath-V2

Open source & experience

Code license

Apache 2.0

Weights license

Apache 2.0- 免费商用授权

GitHub repo

https://github.com/deepseek-ai/DeepSeek-Math-V2

Hugging Face

https://huggingface.co/deepseek-ai/DeepSeek-Math-V2

Live demo

No live demo

DeepSeekMath-V2

Official resources

Paper

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

DataLearnerAI blog

No blog post yet

DeepSeekMath-V2

API details

API speed

3/5

No public API pricing yet.

DeepSeekMath-V2

Benchmark Results

DeepSeekMath-V2 currently shows benchmark results led by IMO-ProofBench (1 / 16, score 99), IMO-ProofBench Advanced (1 / 8, score 61.90). This page also consolidates core specs, context limits, and API pricing so you can evaluate the model from benchmark results and deployment constraints together.

数学推理

2 evaluations

Benchmark / mode

Score

Rank/total

IMO-ProofBench

High

1 / 16

IMO-ProofBench Advanced

High

61.90

1 / 8

View benchmark analysis Compare with other models

DeepSeekMath-V2

Publisher

DeepSeek-AI

View publisher details

DeepSeekMath-V2

Model Overview

DeepSeekAI 发布了可自证数学推理模型 DeepSeekMath-V2。该模型在自然语言证明生成中引入 LLM 验证器作为奖励信号，目标是统一“答案正确性”与“推理可核验性”的表现。

一、它是什么：从“答对”到“能证”

仅以“最终答案正确”作为优化目标，在定理证明等任务上容易遇到上限。DeepSeekMath-V2 通过训练一个 LLM 验证器（verifier），再以其作为奖励模型去优化证明生成器（generator），促使生成器在给出最终答案前自查并修复推理漏洞。在放大测试时计算量（scaled test-time compute）的设定下，官方示例显示该模型在 IMO 2025 / CMO 2024 达到金牌水平，并在 Putnam 2024 取得 118/120 的接近满分成绩。

二、核心技术参数与架构线索

规模与权重：参数量 685B；提供 BF16 / F8_E4M3 / F32 等权重格式；许可 Apache-2.0。
上下文与标签：模型页带有 “DeepSeek 108k” 标签（≈108K tokens 级别的长上下文）。
基座与推理栈：构建于 DeepSeek-V3.2-Exp-Base；推理参考 DeepSeek-V3.2-Exp 的实现（如 SGLang / vLLM 配方、DeepSeek Sparse Attention、并行与配置样例等）。
权重体量（部署提示）：仓库以百余个 safetensors 分片发布，总体量数百 GB 量级，对磁盘、网络与显存提出较高要求，通常需要张量/专家并行。

三、训练/方法要点

验证器—生成器闭环：先训验证器判定证明的正确性与完备性；再以其作为奖励信号优化生成器的输出质量。
动态拉高验证难度：随着生成器变强，扩大验证计算量，自动标注更难验证的样本，不断滚动增强验证器，从而反哺生成器。
目标机制：推动“可自证（self-verifiable）”数学推理，即不仅给出答案，还生成尽可能可被机器核验的完整推理链。

四、评测与结果（官方口径）

竞赛类：IMO 2025 / CMO 2024 达到金牌水平；Putnam 2024 取得 118/120（在放大测试时计算量设定下）。
可复核材料：官方提供图表与样例输出目录，用于对评测设置和逐题表现进行交叉核对。

五、应用场景与当前限制

适配场景

竞赛级数学题的分步求解与非选择题叙述性证明草稿生成；
定理证明/反例构造的思路探索，在验证器约束下进行“先生成、再自审”的推理流程。

现实约束

资源开销大：685B 等级权重 + 数百 GB 分片，对显存、磁盘、I/O 与带宽的门槛较高；多数部署需模型并行（TP/DP/EP）与专用推理栈。
在线可用性：当前缺少官方托管的在线交互 Demo，多为自行部署或等待第三方推理服务支持。
评测透明度：核心结果以图表与样例概述呈现；如需严格可复现的脚本与逐题细表，需关注后续增补。

六、获取与部署

权重与许可：在开源平台提供 DeepSeekMath-V2 权重分片与模型卡，许可 Apache-2.0。
推理路径：参考 DeepSeek-V3.2-Exp 仓库给出的转换脚本、SGLang / vLLM 启动配方、Sparse Attention 选项以及并行配置（如 config_671B_v3.2.json 等），按本地/多机环境选择合适的并行与内存规划。

七、关键信息速览

项信息

模型名DeepSeekMath-V2

任务定位数学推理 / 定理证明（强调“可自证”的推理链）

参数量685B（BF16 / F8_E4M3 / F32 权重）

上下文≈108K tokens（“DeepSeek 108k” 标签）

许可Apache-2.0

评测要点IMO 2025 / CMO 2024 金牌水平；Putnam 2024 118/120（放大测试时计算量）

基座/推理基于 DeepSeek-V3.2-Exp-Base；推理参考 DeepSeek-V3.2-Exp

在线 Demo暂无官方交互页（需自建/等待托管）

八、与 DeepSeek-Prover-V2 的关系（易混点）

DeepSeek-Prover-V2（671B） 主攻 Lean4 形式化定理证明 与 ProverBench；DeepSeekMath-V2 则聚焦自然语言数学推理/证明生成 + 自我验证。两者方向相邻但并不等同，可互参其研究脉络与评测设置。

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送