DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
  1. Back to benchmark list
  2. /
  3. MMLU Pro

MMLU Pro

大模型已经对很多行业产生了巨大的影响,如何准确评测大模型的能力和效果,已经成为业界亟待解决的关键问题。生成式AI模型,如大型语言模型(LLMs),能够生成高质量的文本、代码、图像等内容,但其评测却相对很困难。而此前很多较早的评测也很难区分当前最优模型的能力。 以MMLU评测为例,2023年3月份,GPT-4在MMLU获得了86.4分之后,将近2年后的2024年年底,业界最好的大模型在MMLU上得分也就90.5,提升十分有限。 为此,滑铁卢大学、多伦多大学和卡耐基梅隆大学的研究人员一起提出了MMLU P

更新于 2026-03-22
2,405 次浏览
问题数量
38500
发布机构
Berkeley Artificial Intelligence Research
评测类别
综合评估
评测指标
Accuracy
支持语言
英文
难度等级
中等难度

简介

MMLU 的专业级别版本,包含更具挑战性的问题,旨在评估模型在专业领域的理解和推理能力。

相关资源

查看原始论文
阅读学术论文原文
获取数据集
下载评测数据集
访问官网
浏览项目官方网站
DataLearner 介绍
中文详细解读

MMLU Pro Model Score Leaderboard

Source: DataLearnerAI

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

模式说明:
normal
thinking
low
medium
high
deeper thinking
parallel_thinking
图表加载中...

Latest MMLU Pro model rankings and full benchmark leaderboard

Browse the latest scores, model modes, release dates, and parameter sizes for MMLU Pro.

MMLU Pro详细排名数据表格

排名模型
1
OpenAI o1
Normal
91.042024-12-05未知
2
Gemini 3.0 Pro (Preview 11-2025)
Thinking Level · Medium
902025-11-18未知
3
Claude Opus 4.5
Thinking Level · Medium
902025-11-25未知
4
Claude Opus 4.1
Thinking Level · Medium
882025-08-06未知
5
Claude Sonnet 4.5
Thinking Level · Medium
882025-09-30未知
6
M2.1
Thinking Level · Medium
882025-12-232300
7
Qwen3.5-397B-A17B
Thinking Level · Medium
87.82026-02-16397
8
Qwen3.5-397B-A17B
Thinking Level · Medium
87.82026-02-16397
9
Hunyuan-T1
Normal
87.22025-03-21未知
10
Grok 4
Thinking Level · Medium
872025-07-10未知
11
GPT-4.5
Normal
86.12025-02-28未知
12
Qwen3.5-27B
Thinking Level · Medium
86.12026-02-25270
13
Gemini 2.5-Pro
Normal
862025-06-05未知
14
Qwen3-Max-Thinking
Thinking Level · Medium
85.72026-01-2610000
15
OpenAI o3
Normal
85.62025-04-16未知
16
Claude Opus 4
Normal
852025-05-23未知
17
DeepSeek-R1-0528
Thinking Level · Medium
852025-05-286710
18
DeepSeek-V3.1
Thinking Level · Medium
852025-08-206710
19
DeepSeek-V3.1 Terminus
Thinking Level · Medium
852025-09-226710
20
DeepSeek-V3.1 Terminus
Normal
852025-09-226710
21
DeepSeek V3.2-Exp
Thinking Level · Medium
852025-09-296710
22
Grok 4.1 Fast
Thinking Level · Medium
852025-11-19未知
23
GLM-4.5
Thinking Level · Medium
84.62025-07-283550
24
Kimi K2 Thinking
Thinking Level · Medium
84.62025-11-0610400
25
Qwen3-235B-A22B-Thinking-2507
Thinking Level · Medium
84.42025-07-252350
26
Qwen3-235B-A22B-Thinking
Thinking Level · Medium
84.42025-07-30305
27
GLM-4.7
Thinking Level · Medium
84.32025-12-223580
28
DeepSeek-R1
Normal
842025-01-206710
29
Claude Sonnet 4
Thinking Level · Medium
842025-05-23未知
30
Qwen3 Max (Preview)
Normal
842025-09-05未知
滚动或悬停加载剩余 84 条