DeepSeek-R1-0528 Benchmark Details

DeepSeek-R1-0528 currently shows benchmark results led by MATH-500 (7 / 44, score 98), Creative Writing (4 / 23, score 86.25), MMLU Pro (25 / 126, score 85).

Benchmark Results

DeepSeek-R1-0528

Benchmark Results

General Knowledge

5 evaluations

Benchmark / mode

Score

Rank/total

MMLU Pro

25 / 126

GPQA Diamond

70 / 179

ARC-AGI

21.20

54 / 65

HLE

17.70

113 / 159

ARC-AGI-2

1.30

52 / 59

Common Sense

1 evaluations

Benchmark / mode

Score

Rank/total

SimpleQA

27.80

25 / 45

Coding and Software Engineer

2 evaluations

Benchmark / mode

Score

Rank/total

LiveCodeBench

73.30

45 / 120

SWE-bench Verified

57.60

80 / 108

Math and Reasoning

5 evaluations

Benchmark / mode

Score

Rank/total

MATH-500

7 / 44

AIME 2024

91.40

13 / 62

AIME2025

87.50

44 / 106

IMO-ProofBench

7 / 16

IMO-ProofBench Advanced

3.80

8 / 8

Writing and Creative Capabilities

1 evaluations

Benchmark / mode

Score

Rank/total

Creative Writing

86.25

4 / 23

AI Agent - Tool Usage

1 evaluations

Benchmark / mode

Score

Rank/total

Terminal-Bench

5.70

35 / 35

常识推理

1 evaluations

Benchmark / mode

Score

Rank/total

Simple Bench

Thinking Enabled

40.80

38 / 63

Agent Level Benchmark

1 evaluations

Benchmark / mode

Score

Rank/total

Aider-Polyglot

Thinking Enabled

71.40

15 / 59

Compare with other models