DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Page navigation
Page navigation
Model catalogGPT-4oBenchmark analysis

GPT-4o Benchmark Details

GPT-4o currently shows benchmark results led by HumanEval (8 / 39, score 90), MMLU (15 / 65, score 88.70), BBH (5 / 20, score 91.70).

Benchmark Results

GPT-4o

Benchmark Results

Thinking
Tool usage

综合评估

5 evaluations
Benchmark / mode
Score
Rank/total
BBH
Standard Mode
91.70
5 / 20
MMLU
Standard Mode
88.70
15 / 65
MMLU Pro
Standard Mode
77.90
72 / 126
GPQA Diamond
Standard Mode
70.10
112 / 177
HLE
Standard Mode
5.30
146 / 154

编程与软件工程

4 evaluations
Benchmark / mode
Score
Rank/total
HumanEval
Standard Mode
90
8 / 39
LiveCodeBench
Standard Mode
35.10
105 / 120
SWE-bench Verified
Standard Mode
31
100 / 105
IC SWE-Lancer(Diamond)
Standard Mode
23.30
6 / 8

数学推理

5 evaluations
Benchmark / mode
Score
Rank/total
MATH
Standard Mode
75.90
16 / 42
MATH-500
Standard Mode
75.90
43 / 44
AIME2025
Standard ModeTools
42.10
93 / 106
AIME 2024
Standard Mode
9.30
61 / 62
FrontierMath
Standard Mode
0.30
57 / 60

常识问答

1 evaluations
Benchmark / mode
Score
Rank/total
SimpleQA
Standard Mode
38.20
20 / 45

OpenClaw智能体能力综合测评

1 evaluations
Benchmark / mode
Score
Rank/total
Pinch Bench
Thinking EnabledTools
71.10
30 / 37
Compare with other models