MATH-500
Updated May 2, 2026·2,659 views
- Problem Count
- 500
- Institution
- OpenAI
- Category
- Math and Reasoning
- Metrics
- Accuracy
- Language
- English
- Difficulty
- Mixed
Overview
MATH-500 is an AI benchmark used to evaluate model capabilities. Review its overview, metrics, official resources, and model leaderboard results on DataLearnerAI.
Related resources
Latest MATH-500 model rankings and full benchmark leaderboard
Browse the latest scores, model modes, release dates, and parameter sizes for MATH-500.
Source: DataLearnerAI
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
Model Mode Legend
License:
Origin:
Model release cutoff:
MATH-500 Rank
| Rank | Model | License | |||
|---|---|---|---|---|---|
![]() Gemini-2.5-Pro-Preview-05-06 Standard Mode | 98.80 | 2025-05-06 | Unknown | Closed | |
![]() Gemini 2.5-Pro Standard Mode | 98.80 | 2025-06-05 | Unknown | Closed | |
![]() Claude Opus 4 Standard Mode | 98.20 | 2025-05-23 | Unknown | Closed | |
4 | ![]() GLM-4.5 Thinking Enabled | 98.20 | 2025-07-28 | 355B | Free Commercial |
5 | ![]() OpenAI o3 Standard Mode | 98.10 | 2025-04-16 | Unknown | Closed |
6 | ![]() GLM-4.5-Air Thinking Enabled | 98.10 | 2025-07-28 | 106B | Free Commercial |
7 | ![]() Qwen3-235B-A22B Thinking Enabled | 98.00 | 2025-04-28 | 235B | Free Commercial |
8 | ![]() DeepSeek-R1-0528 Thinking Enabled | 98.00 | 2025-05-28 | 671B | Free Commercial |
9 | ![]() OpenAI o3-mini (high) Standard Mode | 97.90 | 2025-01-31 | Unknown | Closed |
10 | ![]() Claude Opus 4.6 Extended Thinking | 97.60 | 2026-02-05 | Unknown | Closed |
11 | ![]() Qwen3-8B Thinking Enabled | 97.40 | 2025-04-28 | 8B | Free Commercial |
12 | ![]() Kimi K2 Standard Mode | 97.40 | 2025-07-11 | 1000B | Free Commercial |
13 | ![]() DeepSeek-R1 Standard Mode | 97.30 | 2025-01-20 | 671B | Free Commercial |
14 | ![]() Qwen3-32B Thinking Enabled | 97.20 | 2025-04-28 | 32B | Free Commercial |
15 | ![]() MiniMax-M1-80k Standard Mode | 96.80 | 2025-06-16 | 456B | Free Commercial |
16 | ![]() Pangu Pro MoE Standard Mode | 96.80 | 2025-06-30 | 71.9B | Free Commercial |
17 | ![]() OpenAI o1 Standard Mode | 96.40 | 2024-12-05 | Unknown | Closed |
18 | ![]() ERNIE-4.5-300B-A47B Standard Mode | 96.40 | 2025-06-30 | 300B | Free Commercial |
19 | ![]() Kimi k1.5 (Long-CoT) Standard Mode | 96.20 | 2025-01-22 | Unknown | Closed |
20 | ![]() Claude Sonnet 3.7-64K Extended Thinking Standard Mode | 96.20 | 2025-02-25 | Unknown | Closed |
21 | ![]() Hunyuan-T1 Standard Mode | 96.20 | 2025-03-21 | Unknown | Closed |
22 | ![]() Qwen3-235B-A22B Standard Mode | 96.20 | 2025-04-28 | 235B | Free Commercial |
23 | ![]() MiniMax-M1-40k Standard Mode | 96.00 | 2025-06-16 | 456B | Free Commercial |
24 | ![]() OpenAI o3-mini Thinking Enabled | 95.80 | 2025-01-31 | Unknown | Closed |
25 | ![]() Llama 4 Behemoth Instruct Standard Mode | 95.00 | 2025-04-05 | 2000B | Free Commercial |
26 | ![]() Kimi k1.5 (Short-CoT) Standard Mode | 94.60 | 2025-01-22 | Unknown | Closed |
27 | ![]() DeepSeek-R1-Distill-Llama-70B Standard Mode | 94.50 | 2025-01-20 | 70B | Free Commercial |
28 | ![]() DeepSeek-V3-0324 Standard Mode | 94.00 | 2025-03-24 | 671B | Free Commercial |
29 | ![]() Hunyuan-7B Standard Mode | 93.70 | 2025-08-04 | 7B | Free Commercial |
30 | ![]() GPT-4.1 Standard Mode | 92.80 | 2025-04-14 | Unknown | Closed |
Scroll to load 14 more













