DeepSeek V3.2-Exp Benchmark Details

DeepSeek V3.2-Exp currently shows benchmark results led by SimpleQA (1 / 45, score 97.10), Aider-Polyglot (11 / 59, score 74.20), MMLU Pro (25 / 126, score 85).

Benchmark Results

DeepSeek V3.2-Exp

Benchmark Results

General Knowledge

9 evaluations

Benchmark / mode

Score

Rank/total

MMLU Pro

25 / 126

MMLU Pro

37 / 126

GPQA Diamond

79.90

78 / 179

GPQA Diamond

97 / 179

LiveBench

Standard Mode

49.85

91 / 115

LiveBench

Thinking Mode

58.90

73 / 115

HLE

20.30

104 / 159

HLE

19.80

106 / 159

HLE

8.60

139 / 159

Common Sense

1 evaluations

Benchmark / mode

Score

Rank/total

SimpleQA

97.10

1 / 45

Coding and Software Engineer

3 evaluations

Benchmark / mode

Score

Rank/total

LiveCodeBench

74.10

41 / 120

LiveCodeBench

84 / 120

SWE-bench Verified

67.80

67 / 108

Math and Reasoning

2 evaluations

Benchmark / mode

Score

Rank/total

AIME2025

89.30

39 / 106

AIME2025

83 / 106

AI Agent - Tool Usage

2 evaluations

Benchmark / mode

Score

Rank/total

Terminal-Bench

37.70

14 / 35

Terminal-Bench

30 / 35

Agent Level Benchmark

5 evaluations

Benchmark / mode

Score

Rank/total

Aider-Polyglot

Standard Mode

70.20

17 / 59

Aider-Polyglot

Thinking Mode

74.20

11 / 59

τ²-Bench

66.70

26 / 40

τ²-Bench - Telecom

34 / 35

τ²-Bench - Telecom

34 / 35

Instruction Following

1 evaluations

Benchmark / mode

Score

Rank/total

IF Bench

54.10

26 / 29

AI Agent - Information Search

1 evaluations

Benchmark / mode

Score

Rank/total

BrowseComp

40.10

41 / 45

Compare with other models