DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Page navigation
目录
Model catalogDeepSeek-V3.1Benchmark analysis

DeepSeek-V3.1 Benchmark Details

DeepSeek-V3.1 currently shows benchmark results led by MMLU (1 / 65, score 93.40), SimpleQA (4 / 45, score 93.40), AIME 2024 (7 / 62, score 93.10).

Benchmark Results

DeepSeek-V3.1

Benchmark Results

Thinking
Thinking mode details (1)
Tool usage

综合评估

4 evaluations
Benchmark / mode
Score
Rank/total
MMLU
Thinking Enabled
93.40
1 / 65
MMLU Pro
Thinking Enabled
85
23 / 124
GPQA Diamond
Thinking Enabled
80.10
72 / 175
HLE
Thinking Enabled
15.90
110 / 149

常识问答

1 evaluations
Benchmark / mode
Score
Rank/total
SimpleQA
Thinking Enabled
93.40
4 / 45

编程与软件工程

1 evaluations
Benchmark / mode
Score
Rank/total
LiveCodeBench
Thinking Enabled
74.80
38 / 118

数学推理

2 evaluations
Benchmark / mode
Score
Rank/total
AIME 2024
Thinking Enabled
93.10
7 / 62
AIME2025
Thinking Enabled
88.40
42 / 106

Agent能力评测

1 evaluations
Benchmark / mode
Score
Rank/total
Aider-Polyglot
Thinking Enabled
76.30
5 / 26
Compare with other models