DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Page navigation
目录
Model catalogDeepSeek V3.2Benchmark analysis

DeepSeek V3.2 Benchmark Details

DeepSeek V3.2 currently shows benchmark results led by LiveCodeBench (19 / 118, score 83.30), AIME2025 (30 / 106, score 93.10), GPQA Diamond (61 / 175, score 82.40). This page also tracks comparisons against 3 predecessor or same-series models. 1 source link is attached for reference.

Benchmark Results

DeepSeek V3.2

Benchmark Results

Thinking
All modesThinking
Thinking mode details (1)
All thinking modesDefault (Thinking Mode)
Tool usage
All modesWith toolsNo tools

综合评估

4 evaluations
Benchmark / mode
Score
Rank/total
GPQA Diamond
Thinking Mode
82.40
61 / 175
ARC-AGI
Thinking Mode
57
29 / 56
HLE
Thinking Mode
25.10
78 / 148
ARC-AGI-2
Thinking Mode
4
38 / 49

编程与软件工程

5 evaluations
Benchmark / mode
Score
Rank/total
CodeForces
Thinking Mode
2386
11 / 16
LiveCodeBench
Thinking Mode
83.30
19 / 118
SWE-bench Verified
Thinking Mode
70.20
51 / 103
SWE-bench Verified
Thinking ModeTools
73.10
40 / 103
SWE-Bench Pro - Public
Thinking Mode
40.90
31 / 36

数学推理

2 evaluations
Benchmark / mode
Score
Rank/total
AIME2025
Thinking Mode
93.10
30 / 106
AIME 2026
Thinking Mode
92.70
7 / 14

Agent能力评测

2 evaluations
Benchmark / mode
Score
Rank/total
τ²-Bench
Thinking ModeTools
80.30
14 / 40
Aider-Polyglot
Thinking ModeTools
69.90
12 / 26

AI Agent - 信息收集

1 evaluations
Benchmark / mode
Score
Rank/total
BrowseComp
Thinking Mode
51.40
33 / 43

AI Agent - 工具使用

1 evaluations
Benchmark / mode
Score
Rank/total
Terminal Bench 2.0
Thinking ModeTools
46.40
36 / 43

OpenClaw智能体能力综合测评

2 evaluations
Benchmark / mode
Score
Rank/total
Pinch Bench
Thinking ModeTools
84.30
18 / 37
Claw Bench
Thinking ModeTools
79
21 / 29
Compare with other models

Version History

How each version of the DeepSeek V3.2 series stacks up on benchmark tests

DeepSeek V3.2DeepSeek-V3.1DeepSeek-V3-0324DeepSeek-V3
Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. See the table below for per-mode details.

Benchmark Score Comparison

8 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

BenchmarkDeepSeek V3.2CurrentDeepSeek-V3.1DeepSeek-V3-0324DeepSeek-V3
ARC-AGI
综合评估
57.00Thinking Enabled
--
9.00Standard Mode
--
GPQA Diamond
综合评估
82.40Thinking Enabled
80.10Thinking Enabled
68.40Standard Mode
59.10Standard Mode
HLE
综合评估
25.10Thinking Enabled
15.90Thinking Enabled
5.20Standard Mode
--
LiveCodeBench
编程与软件工程
83.30Thinking Enabled
74.80Thinking Enabled
49.20Standard Mode
34.60Standard Mode
SWE-bench Verified
编程与软件工程
73.10Thinking Enabled | Tools
66.00Standard Mode
38.80Standard Mode
--
AIME2025
数学推理
93.10Thinking Enabled
88.40Thinking Enabled
47.70Standard Mode
--
Aider-Polyglot
Agent能力评测
69.90Thinking Enabled | Tools
76.30Thinking Enabled
55.10Standard Mode
--
τ²-Bench
Agent能力评测
80.30Thinking Enabled | Tools
--
38.80Standard Mode | Tools
--

Single-Benchmark Version Trend

Viewing: ARC-AGI · 综合评估

Benchmark
NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the DeepSeek V3.2 Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier.

These models use different currencies or billing units, so the page falls back to raw price values instead of a shared bar chart.

DeepSeek-V3.1
Standard input: 0.56 美元/100 万tokens
Standard output: 1.68 美元/100 万tokens
DeepSeek-V3-0324
Standard input: 0.27 美元/100万 tokens
Standard output: 1.1 美元/100万 tokens
ModelSupplierStandard inputStandard outputBase price applies to
DeepSeek-V3.1
—0.56 美元/100 万tokens1.68 美元/100 万tokens—
DeepSeek-V3-0324
—0.27 美元/100万 tokens1.1 美元/100万 tokens—

Sources

arcprize.org