DeepSeek V3.2vsDeepSeek-V3.1

Across 6 shared benchmarks, DeepSeek V3.2 leads overall: DeepSeek V3.2 wins 5, DeepSeek-V3.1 wins 1, with 0 ties and an average score difference of +4.23.

DeepSeek-AI
DeepSeek V3.2

DeepSeek-AI · 2025-12-01 · Reasoning model

DeepSeek-AI
DeepSeek-V3.1

DeepSeek-AI · 2025-08-20 · Chat model

DeepSeek V3.25 wins(83%)(17%)1 winDeepSeek-V3.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.

Coding and Software Engineer

DeepSeek V3.2 2/2
BenchmarkDeepSeek V3.2DeepSeek-V3.1Diff
LiveCodeBench83.3021 / 120Thinking (No Tools)74.8040 / 120+8.50
SWE-bench Verified73.1045 / 1086670 / 108+7.10

General Knowledge

DeepSeek V3.2 2/2
BenchmarkDeepSeek V3.2DeepSeek-V3.1Diff
HLE25.1087 / 157Thinking (No Tools)15.90118 / 157+9.20
GPQA Diamond82.4064 / 178Thinking (No Tools)80.1075 / 178+2.30

Agent Level Benchmark

DeepSeek-V3.1 1/1
BenchmarkDeepSeek V3.2DeepSeek-V3.1Diff
Aider-Polyglot69.9012 / 2676.305 / 26-6.40

Math and Reasoning

DeepSeek V3.2 1/1
BenchmarkDeepSeek V3.2DeepSeek-V3.1Diff
AIME202593.1030 / 106Thinking (No Tools)88.4042 / 106+4.70

Specs

FieldDeepSeek V3.2DeepSeek-V3.1
PublisherDeepSeek-AIDeepSeek-AI
Release date2025-12-012025-08-20
Model typeReasoning modelChat model
ArchitectureMoEMoE
Parameters671B671B
Context length128K128K
Max output8K8K

Summary

  • DeepSeek V3.2leads in:Coding and Software Engineer (2/2), General Knowledge (2/2), Math and Reasoning (1/1)
  • DeepSeek-V3.1leads in:Agent Level Benchmark (1/1)

On average across the 6 shared benchmarks, DeepSeek V3.2 scores 4.23 higher.

Largest single-benchmark gap: HLE — DeepSeek V3.2 25.10 vs DeepSeek-V3.1 15.90 (+9.20).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.