DeepSeek-V3-0324 Benchmark Details

DeepSeek-V3-0324 currently shows benchmark results led by GSM8K (3 / 26, score 96.30), GPQA (3 / 15, score 68.40), DROP (3 / 9, score 89.70).

Benchmark Results

DeepSeek-V3-0324

Benchmark Results

General Knowledge

6 evaluations

Benchmark / mode

Score

Rank/total

MMLU

86.50

28 / 66

MMLU Pro

81.20

55 / 132

GPQA Diamond

68.40

125 / 187

GPQA

68.40

3 / 15

ARC-AGI

62 / 68

HLE

5.20

165 / 172

Math and Reasoning

7 evaluations

Benchmark / mode

Score

Rank/total

GSM8K

96.30

3 / 26

MATH-500

28 / 44

AIME 2024

59.40

43 / 62

AIME2025

47.70

89 / 107

IMO-ProofBench

4.30

15 / 16

IMO 2024

1.70

9 / 10

IMO 2025

1.70

9 / 9

Reading Comprehension

1 evaluations

Benchmark / mode

Score

Rank/total

DROP

89.70

3 / 9

Common Sense

1 evaluations

Benchmark / mode

Score

Rank/total

SimpleQA

27.20

28 / 47

Coding and Software Engineer

2 evaluations

Benchmark / mode

Score

Rank/total

LiveCodeBench

49.20

95 / 123

SWE-bench Verified

38.80

103 / 112

Writing and Creative Capabilities

1 evaluations

Benchmark / mode

Score

Rank/total

Creative Writing

81.60

15 / 23

AI Agent - Tool Usage

1 evaluations

Benchmark / mode

Score

Rank/total

Terminal-Bench

13.30

34 / 35

Common Sense Reasoning

1 evaluations

Benchmark / mode

Score

Rank/total

Simple Bench

Standard Mode

27.20

51 / 63

Agent Level Benchmark

2 evaluations

Benchmark / mode

Score

Rank/total

Aider-Polyglot

Standard Mode

55.10

27 / 59

τ²-Bench

38.80

38 / 43

Compare with other models