DeepSeek-V3 Benchmark Details

DeepSeek-V3 currently shows benchmark results led by BBH (3 / 21, score 92.30), MATH (7 / 42, score 87.80), HumanEval (9 / 39, score 89).

Benchmark Results

DeepSeek-V3

Benchmark Results

General Knowledge

5 evaluations

Benchmark / mode

Score

Rank/total

BBH

92.30

3 / 21

MMLU

88.50

17 / 66

MMLU Pro

75.90

83 / 132

GPQA Diamond

59.10

148 / 187

GPQA

59.10

6 / 15

Coding and Software Engineer

2 evaluations

Benchmark / mode

Score

Rank/total

HumanEval

9 / 39

LiveCodeBench

34.60

110 / 123

Math and Reasoning

4 evaluations

Benchmark / mode

Score

Rank/total

MATH

87.80

7 / 42

MATH-500

87.80

39 / 44

AIME 2024

52 / 62

FrontierMath

1.70

49 / 60

Common Sense

1 evaluations

Benchmark / mode

Score

Rank/total

SimpleQA

24.90

31 / 47

Writing and Creative Capabilities

1 evaluations

Benchmark / mode

Score

Rank/total

Creative Writing

81.60

15 / 23

Common Sense Reasoning

1 evaluations

Benchmark / mode

Score

Rank/total

Simple Bench

Standard Mode

18.90

59 / 63

Agent Level Benchmark

1 evaluations

Benchmark / mode

Score

Rank/total

Aider-Polyglot

Standard Mode

48.40

34 / 59

Compare with other models