Benchmark Results
Benchmark Results
General Knowledge
4 evaluationsCoding and Software Engineer
5 evaluationsMath and Reasoning
3 evaluationsAgent Level Benchmark
2 evaluationsClaw-style Agent Evaluation
2 evaluationsVersion History
How each version of the DeepSeek V3.2 series stacks up on benchmark tests
8 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.
| Benchmark | DeepSeek V3.2Current | DeepSeek-V3.1 | DeepSeek-V3-0324 | DeepSeek-V3 |
|---|---|---|---|---|
ARC-AGI 综合评估 | 57.00Thinking Enabled | -- | 9.00Standard Mode | -- |
GPQA Diamond 综合评估 | 82.40Thinking Enabled | 80.10Thinking Enabled | 68.40Standard Mode | 59.10Standard Mode |
HLE 综合评估 | 25.10Thinking Enabled | 15.90Thinking Enabled | 5.20Standard Mode | -- |
LiveCodeBench 编程与软件工程 | 83.30Thinking Enabled | 74.80Thinking Enabled | 49.20Standard Mode | 34.60Standard Mode |
SWE-bench Verified 编程与软件工程 | 73.10Thinking Enabled | Tools | 66.00Standard Mode | 38.80Standard Mode | -- |
AIME2025 数学推理 | 93.10Thinking Enabled | 88.40Thinking Enabled | 47.70Standard Mode | -- |
Aider-Polyglot Agent能力评测 | 69.90Thinking Enabled | Tools | 76.30Thinking Enabled | 55.10Standard Mode | -- |
τ²-Bench Agent能力评测 | 80.30Thinking Enabled | Tools | -- | 38.80Standard Mode | Tools | -- |
Single-Benchmark Version Trend
Viewing: ARC-AGI · 综合评估
Standard API Pricing Across the DeepSeek V3.2 Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.