DeepSeek-V4-ProvsDeepSeek-V3.1

Across 5 shared benchmarks, DeepSeek-V3.1 leads overall: DeepSeek-V4-Pro wins 2, DeepSeek-V3.1 wins 3, with 0 ties and an average score difference of -0.60.

DeepSeek-V4-Pro

DeepSeek-AI · 2026-04-24 · Reasoning model

DeepSeek-V3.1

DeepSeek-AI · 2025-08-20 · AI model

DeepSeek-V4-Pro2 wins(40%)(60%)3 winsDeepSeek-V3.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 5 shared benchmarks.

General Knowledge

DeepSeek-V3.1 3/3

Benchmark	DeepSeek-V4-Pro	DeepSeek-V3.1	Diff
HLE	7.70133 / 149Normal (No Tools)	15.90110 / 149thinking	-8.20
GPQA Diamond	72.9099 / 175Normal (No Tools)	74.9092 / 175	-2
MMLU Pro	82.9044 / 124Normal (No Tools)	83.7039 / 124	-0.80

Specs

Field	DeepSeek-V4-Pro	DeepSeek-V3.1
Publisher	DeepSeek-AI	DeepSeek-AI
Release date	2026-04-24	2025-08-20
Model type	Reasoning model	AI model
Architecture	MoE	MoE
Parameters	16000.0	6710.0
Context length	1M	128K
Max output	384000	8192

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

Item	DeepSeek-V4-Pro	DeepSeek-V3.1
Text input	$1.74 / 1M tokens	0.56 美元/100 万tokens
Text output	$3.48 / 1M tokens	1.68 美元/100 万tokens
Cache read	$0.145 / 1M tokens	Not public
Cache write	$1.74 / 1M tokens	Not public

Summary

DeepSeek-V4-Proleads in:Coding and Software Engineer (2/2)
DeepSeek-V3.1leads in:General Knowledge (3/3)

On average across the 5 shared benchmarks, DeepSeek-V3.1 scores 0.60 higher.

Largest single-benchmark gap: HLE — DeepSeek-V4-Pro 7.70 vs DeepSeek-V3.1 15.90 (-8.20).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

DeepSeek-V4-Pro details DeepSeek-V3.1 details·Customize in compare tool

Benchmark	DeepSeek-V4-Pro	DeepSeek-V3.1	Diff
SWE-bench Verified	73.6036 / 103Normal (With Tools)	6665 / 103	+7.60
LiveCodeBench	56.8073 / 118Normal (No Tools)	56.4076 / 118	+0.40

Benchmark scores

General Knowledge

Specs

API pricing

Summary

Coding and Software Engineer