DeepSeek-V3.1 Benchmark Details

DeepSeek-V3.1 currently shows benchmark results led by MMLU (1 / 65, score 93.40), SimpleQA (4 / 45, score 93.40), AIME 2024 (7 / 62, score 93.10).

Benchmark Results

DeepSeek-V3.1

Benchmark Results

综合评估

4 evaluations

Benchmark / mode

Score

Rank/total

MMLU

Thinking Enabled

93.40

1 / 65

MMLU Pro

Thinking Enabled

23 / 124

GPQA Diamond

Thinking Enabled

80.10

72 / 175

HLE

Thinking Enabled

15.90

110 / 149

常识问答

1 evaluations

Benchmark / mode

Score

Rank/total

SimpleQA

Thinking Enabled

93.40

4 / 45

编程与软件工程

1 evaluations

Benchmark / mode

Score

Rank/total

LiveCodeBench

Thinking Enabled

74.80

38 / 118

数学推理

2 evaluations

Benchmark / mode

Score

Rank/total

AIME 2024

Thinking Enabled

93.10

7 / 62

AIME2025

Thinking Enabled

88.40

42 / 106

Agent能力评测

1 evaluations

Benchmark / mode

Score

Rank/total

Aider-Polyglot

Thinking Enabled

76.30

5 / 26

Compare with other models

DeepSeek-V3.1 Benchmark Details

DeepSeek-V3.1 currently shows benchmark results led by MMLU (1 / 65, score 93.40), SimpleQA (4 / 45, score 93.40), AIME 2024 (7 / 62, score 93.10).

Benchmark Results

DeepSeek-V3.1

Benchmark Results

综合评估

4 evaluations

Benchmark / mode

Score

Rank/total

MMLU

Thinking Enabled

93.40

1 / 65

MMLU Pro

Thinking Enabled

23 / 124

GPQA Diamond

Thinking Enabled

80.10

72 / 175

HLE

Thinking Enabled

15.90

110 / 149

常识问答

1 evaluations

Benchmark / mode

Score

Rank/total

SimpleQA

Thinking Enabled

93.40

4 / 45

编程与软件工程

1 evaluations

Benchmark / mode

Score

Rank/total

LiveCodeBench

Thinking Enabled

74.80

38 / 118

数学推理

2 evaluations

Benchmark / mode

Score

Rank/total

AIME 2024

Thinking Enabled

93.10

7 / 62

AIME2025

Thinking Enabled

88.40

42 / 106

Agent能力评测

1 evaluations

Benchmark / mode

Score

Rank/total

Aider-Polyglot

Thinking Enabled

76.30

5 / 26

Compare with other models