MMLU Pro
Updated Apr 24, 2026·4,009 views
- Problem Count
- 38500
- Institution
- Berkeley Artificial Intelligence Research
- Category
- General Evaluation
- Metrics
- Accuracy
- Language
- English
- Difficulty
- Medium
Overview
MMLU Pro is an AI benchmark used to evaluate model capabilities. Review its overview, metrics, official resources, and model leaderboard results on DataLearnerAI.
Related resources
Latest MMLU Pro model rankings and full benchmark leaderboard
Browse the latest scores, model modes, release dates, and parameter sizes for MMLU Pro.
Source: DataLearnerAI
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
Model Mode Legend
License:
Origin:
Model release cutoff:
MMLU Pro Rank
| Rank | Model | License | |||
|---|---|---|---|---|---|
![]() OpenAI o1 Standard Mode | 91.04 | 2024-12-05 | Unknown | Closed | |
![]() Gemini 3.0 Pro (Preview 11-2025) Thinking Enabled | 90.00 | 2025-11-18 | Unknown | Closed | |
![]() Opus 4.5 Extended Thinking | 90.00 | 2025-11-25 | Unknown | Closed | |
4 | ![]() Qwen 3.6 Plus Preview Thinking Enabled | 88.50 | 2026-03-31 | Unknown | Closed |
5 | ![]() Opus 4.1 Extended Thinking | 88.00 | 2025-08-06 | Unknown | Closed |
6 | ![]() Claude Sonnet 4.5 Thinking Enabled | 88.00 | 2025-09-30 | Unknown | Closed |
7 | ![]() M2.1 Thinking Enabled | 88.00 | 2025-12-23 | 230B | Free Commercial |
8 | ![]() Qwen3.5-397B-A17B Thinking Enabled | 87.80 | 2026-02-16 | 39.7B | Free Commercial |
9 | ![]() DeepSeek-V4-Pro Thinking Level · High | 87.50 | 2026-04-24 | 1600B | Free Commercial |
10 | ![]() Hunyuan-T1 Standard Mode | 87.20 | 2025-03-21 | Unknown | Closed |
11 | ![]() DeepSeek-V4-Pro Thinking Level · High | 87.10 | 2026-04-24 | 1600B | Free Commercial |
12 | Grok 4 Thinking Enabled | 87.00 | 2025-07-10 | Unknown | Closed |
13 | ![]() DeepSeek-V4-Flash Thinking Level · High | 86.40 | 2026-04-24 | 284B | Free Commercial |
14 | ![]() Qwen3.6-27B Thinking Enabled | 86.20 | 2026-04-22 | 27B | Free Commercial |
15 | ![]() DeepSeek-V4-Flash Thinking Level · High | 86.20 | 2026-04-24 | 284B | Free Commercial |
16 | ![]() GPT-4.5 Standard Mode | 86.10 | 2025-02-28 | Unknown | Closed |
17 | ![]() Qwen3.5-27B Thinking Enabled | 86.10 | 2026-02-25 | 27B | Free Commercial |
18 | ![]() Gemini 2.5-Pro Standard Mode | 86.00 | 2025-06-05 | Unknown | Closed |
19 | ![]() Qwen3-Max-Thinking Thinking Enabled | 85.70 | 2026-01-26 | 1000B | Closed |
20 | ![]() OpenAI o3 Standard Mode | 85.60 | 2025-04-16 | Unknown | Closed |
21 | ![]() Gemma 4 31B Thinking Enabled | 85.20 | 2026-04-02 | 3.1B | Free Commercial |
22 | ![]() Qwen3.6-35B-A3B Thinking Enabled | 85.20 | 2026-04-16 | 35B | Free Commercial |
23 | ![]() Claude Opus 4 Standard Mode | 85.00 | 2025-05-23 | Unknown | Closed |
24 | ![]() DeepSeek-R1-0528 Thinking Enabled | 85.00 | 2025-05-28 | 671B | Free Commercial |
25 | ![]() DeepSeek-V3.1 Thinking Enabled | 85.00 | 2025-08-20 | 671B | Free Commercial |
26 | ![]() DeepSeek-V3.1 Terminus Thinking Enabled | 85.00 | 2025-09-22 | 671B | Free Commercial |
27 | ![]() DeepSeek-V3.1 Terminus Standard Mode | 85.00 | 2025-09-22 | 671B | Free Commercial |
28 | ![]() DeepSeek V3.2-Exp Thinking Enabled | 85.00 | 2025-09-29 | 671B | Free Commercial |
29 | Grok 4.1 Fast Thinking Enabled | 85.00 | 2025-11-19 | Unknown | Closed |
30 | ![]() GLM-4.5 Thinking Enabled | 84.60 | 2025-07-28 | 355B | Free Commercial |
Scroll to load 94 more








