DeepSeek-V4-ProvsDeepSeek-V3.1

Across 5 shared benchmarks, DeepSeek-V3.1 leads overall: DeepSeek-V4-Pro wins 1, DeepSeek-V3.1 wins 4, with 0 ties and an average score difference of -5.58.

DeepSeek-AI
DeepSeek-V4-Pro

DeepSeek-AI · 2026-04-24 · Reasoning model

DeepSeek-AI
DeepSeek-V3.1

DeepSeek-AI · 2025-08-20 · Chat model

DeepSeek-V4-Pro1 win(20%)(80%)4 winsDeepSeek-V3.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 5 shared benchmarks.

General Knowledge

DeepSeek-V3.1 3/3
BenchmarkDeepSeek-V4-ProDeepSeek-V3.1Diff
HLE7.70141 / 157Normal (No Tools)15.90118 / 157-8.20
GPQA Diamond72.90102 / 178Normal (No Tools)80.1075 / 178-7.20
MMLU Pro82.9046 / 126Normal (No Tools)8525 / 126-2.10

Coding and Software Engineer

Even 2/2
BenchmarkDeepSeek-V4-ProDeepSeek-V3.1Diff
LiveCodeBench56.8075 / 120Normal (No Tools)74.8040 / 120-18
SWE-bench Verified73.6041 / 108Normal (With Tools)6670 / 108+7.60

Specs

FieldDeepSeek-V4-ProDeepSeek-V3.1
PublisherDeepSeek-AIDeepSeek-AI
Release date2026-04-242025-08-20
Model typeReasoning modelChat model
ArchitectureMoEMoE
Parameters1.6T671B
Context length1M128K
Max output375K8K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemDeepSeek-V4-ProDeepSeek-V3.1
Text input$0.435 / 1M tokensNot public
Text output$0.87 / 1M tokensNot public
Cache read$0.87 / 1M tokensNot public
Cache write$0.003625 / 1M tokensNot public

One or both models have incomplete public pricing.

Summary

  • DeepSeek-V3.1leads in:General Knowledge (3/3)
  • Tied in:Coding and Software Engineer

On average across the 5 shared benchmarks, DeepSeek-V3.1 scores 5.58 higher.

Largest single-benchmark gap: LiveCodeBench — DeepSeek-V4-Pro 56.80 vs DeepSeek-V3.1 74.80 (-18).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.