DeepSeek V3.2-Exp Benchmark Details

DeepSeek V3.2-Exp currently shows benchmark results led by SimpleQA (1 / 45, score 97.10), Aider-Polyglot (11 / 59, score 74.20), MMLU Pro (25 / 126, score 85).

Benchmark Results

DeepSeek V3.2-Exp

Benchmark Results

Thinking

General Knowledge

9 evaluations
Benchmark / mode
Score
Rank/total
85
25 / 126
84
37 / 126
79.90
78 / 179
74
97 / 179
LiveBench
Standard Mode
49.85
91 / 115
LiveBench
Thinking Mode
58.90
73 / 115
20.30
104 / 159
19.80
106 / 159
8.60
139 / 159

Common Sense

1 evaluations
Benchmark / mode
Score
Rank/total
97.10
1 / 45

Coding and Software Engineer

3 evaluations
Benchmark / mode
Score
Rank/total
74.10
41 / 120
55
84 / 120
67.80
67 / 108

Math and Reasoning

2 evaluations
Benchmark / mode
Score
Rank/total
89.30
39 / 106
58
83 / 106

AI Agent - Tool Usage

2 evaluations
Benchmark / mode
Score
Rank/total
37.70
14 / 35

Agent Level Benchmark

5 evaluations
Benchmark / mode
Score
Rank/total
Aider-Polyglot
Standard Mode
70.20
17 / 59
Aider-Polyglot
Thinking Mode
74.20
11 / 59
66.70
26 / 40

Instruction Following

1 evaluations
Benchmark / mode
Score
Rank/total
54.10
26 / 29

AI Agent - Information Search

1 evaluations
Benchmark / mode
Score
Rank/total
40.10
41 / 45