DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Page navigation
Page navigation
Model catalogDeepSeek-V3Benchmark analysis

DeepSeek-V3 Benchmark Details

DeepSeek-V3 currently shows benchmark results led by BBH (3 / 20, score 92.30), MATH (7 / 42, score 87.80), HumanEval (9 / 39, score 89).

Benchmark Results

DeepSeek-V3

Benchmark Results

Thinking

综合评估

5 evaluations
Benchmark / mode
Score
Rank/total
BBH
Standard Mode
92.30
3 / 20
MMLU
Standard Mode
88.50
17 / 65
MMLU Pro
Standard Mode
75.90
78 / 124
GPQA Diamond
Standard Mode
59.10
139 / 175
GPQA
Standard Mode
59.10
5 / 14

编程与软件工程

2 evaluations
Benchmark / mode
Score
Rank/total
HumanEval
Standard Mode
89
9 / 39
LiveCodeBench
Standard Mode
34.60
105 / 118

数学推理

4 evaluations
Benchmark / mode
Score
Rank/total
MATH
Standard Mode
87.80
7 / 42
MATH-500
Standard Mode
87.80
39 / 44
AIME 2024
Standard Mode
39
52 / 62
FrontierMath
Standard Mode
1.70
49 / 60

常识问答

1 evaluations
Benchmark / mode
Score
Rank/total
SimpleQA
Standard Mode
24.90
29 / 45

写作和创作

1 evaluations
Benchmark / mode
Score
Rank/total
Creative Writing
Standard Mode
81.60
15 / 23

常识推理

1 evaluations
Benchmark / mode
Score
Rank/total
Simple Bench
Standard Mode
18.90
27 / 27
Compare with other models