DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Page navigation
Page navigation
Model catalogDeepSeek-V3-0324Benchmark analysis

DeepSeek-V3-0324 Benchmark Details

DeepSeek-V3-0324 currently shows benchmark results led by GSM8K (3 / 26, score 96.30), GPQA (2 / 14, score 68.40), DROP (3 / 9, score 89.70).

Benchmark Results

DeepSeek-V3-0324

Benchmark Results

Thinking
Tool usage

综合评估

6 evaluations
Benchmark / mode
Score
Rank/total
MMLU
Standard Mode
86.50
28 / 65
MMLU Pro
Standard Mode
81.20
52 / 126
GPQA Diamond
Standard Mode
68.40
118 / 177
GPQA
Standard Mode
68.40
2 / 14
ARC-AGI
Standard Mode
9
59 / 65
HLE
Standard Mode
5.20
147 / 154

数学推理

7 evaluations
Benchmark / mode
Score
Rank/total
GSM8K
Standard Mode
96.30
3 / 26
MATH-500
Standard Mode
94
28 / 44
AIME 2024
Standard Mode
59.40
43 / 62
AIME2025
Standard Mode
47.70
88 / 106
IMO-ProofBench
Standard Mode
4.30
15 / 16
IMO 2024
Standard Mode
1.70
9 / 10
IMO 2025
Standard Mode
1.70
9 / 9

阅读理解

1 evaluations
Benchmark / mode
Score
Rank/total
DROP
Standard Mode
89.70
3 / 9

常识问答

1 evaluations
Benchmark / mode
Score
Rank/total
SimpleQA
Standard Mode
27.20
26 / 45

编程与软件工程

2 evaluations
Benchmark / mode
Score
Rank/total
LiveCodeBench
Standard Mode
49.20
93 / 120
SWE-bench Verified
Standard Mode
38.80
96 / 105

写作和创作

1 evaluations
Benchmark / mode
Score
Rank/total
Creative Writing
Standard Mode
81.60
15 / 23

AI Agent - 工具使用

1 evaluations
Benchmark / mode
Score
Rank/total
Terminal-Bench
Standard Mode
13.30
34 / 35

常识推理

1 evaluations
Benchmark / mode
Score
Rank/total
Simple Bench
Standard Mode
27.20
22 / 27

Agent能力评测

2 evaluations
Benchmark / mode
Score
Rank/total
Aider-Polyglot
Standard Mode
55.10
21 / 26
τ²-Bench
Standard ModeTools
38.80
36 / 40
Compare with other models