DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Math Reasoning Benchmark Leaderboard

This page provides the most comprehensive LLM math reasoning benchmark leaderboard. We evaluate models including GPT, Claude, Qwen, and DeepSeek using authoritative math benchmarks such as AIME 2025, FrontierMath-Tier4, MATH-500, and GSM8K.

Updated on 2026-05-02 07:14:49

As of 2026-05, this page covers AIME2025, FrontierMath - Tier 4, MATH-500, GSM8K and related benchmarks for LLM Math Reasoning Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Benchmark
AIME2025FrontierMath - Tier 4MATH-500GSM8K
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff:

LLM Performance Results

Data source: DataLearnerAI
RankModelLicense
Google Deep Mind
Gemini-2.5-Pro-Preview-05-06
Google Deep Mind
83.002.1098.80—Proprietary
Anthropic
Claude Opus 4
Anthropic
75.504.2098.20—Proprietary
智谱AI
GLM-4.5
智谱AI
——98.20—Free commercial
4
智谱AI
GLM-4.5-Air
智谱AI
——98.10—Free commercial
5
DeepSeek-AI
DeepSeek-R1-0528
DeepSeek-AI
87.50—98.00—Free commercial
6
阿里巴巴
Qwen3-235B-A22B
阿里巴巴
81.50—98.0096.40Free commercial
7
OpenAI
OpenAI o3-mini (high)
OpenAI
—4.2097.90—Proprietary
8
Moonshot AI
Kimi K2
Moonshot AI
54.000.0197.40—Free commercial
9
DeepSeek-AI
DeepSeek-R1
DeepSeek-AI
70.00—97.30—Free commercial
10
MiniMaxAI
MiniMax-M1-80k
MiniMaxAI
76.90—96.80—Free commercial
11
百度
ERNIE-4.5-300B-A47B
百度
35.10—96.4096.60Free commercial
12
OpenAI
OpenAI o1
OpenAI
——96.40—Proprietary
13
普林斯顿大学
Kimi k1.5 (Long-CoT)
普林斯顿大学
——96.20—Proprietary
14
Anthropic
Claude Sonnet 3.7-64K Extended Thinking
Anthropic
——96.20—Proprietary
15
MiniMaxAI
MiniMax-M1-40k
MiniMaxAI
74.60—96.00—Free commercial
16
Facebook AI研究实验室
Llama 4 Behemoth Instruct
Facebook AI研究实验室
——95.00—Free commercial
17
Moonshot AI
Kimi k1.5 (Short-CoT)
Moonshot AI
——94.60—Proprietary
18
DeepSeek-AI
DeepSeek-V3-0324
DeepSeek-AI
47.70—94.0096.30Free commercial
19
OpenAI
GPT-4.1
OpenAI
36.70—92.8095.90Proprietary
20
OpenAI
GPT-4.5
OpenAI
——90.70—Proprietary
21
OpenAI
OpenAI o1-mini
OpenAI
——90.00—Proprietary
22
DeepSeek-AI
DeepSeek-V3
DeepSeek-AI
——87.80—Free commercial
23
Anthropic
Claude Sonnet 3.7
Anthropic
54.80—82.20—Proprietary
24
StepFunAI
Step 3.5 Flash
StepFunAI
99.80———Free commercial
25
OpenAI
OpenAI o4 - mini
OpenAI
99.506.30——Proprietary
26
智谱AI
GLM-4.6
智谱AI
98.602.10——Free commercial
27
Moonshot AI
Kimi K2.5
Moonshot AI
96.104.20——Free commercial
28
智谱AI
GLM-4.7
智谱AI
95.702.10——Free commercial
29
DeepSeek-AI
DeepSeek V3.2
DeepSeek-AI
93.102.10——Free commercial
30
OpenAI
o3-pro
OpenAI
93.00———Proprietary
31
阿里巴巴
Qwen3-235B-A22B-Thinking-2507
阿里巴巴
92.30———Free commercial
32
DeepSeek-AI
DeepSeek-V3.1 Terminus
DeepSeek-AI
90.00———Free commercial
33
DeepSeek-AI
DeepSeek V3.2-Exp
DeepSeek-AI
89.30———Free commercial
34
DeepSeek-AI
DeepSeek-V3.1
DeepSeek-AI
88.40———Free commercial
35
MiniMaxAI
MiniMax M2.5
MiniMaxAI
86.30———Free commercial
36
上海人工智能实验室
Intern-S1
上海人工智能实验室
86.00———Free commercial
37
StepFunAI
Step3
StepFunAI
82.90———Free commercial
38
MiniMaxAI
M2.1
MiniMaxAI
81.00———Free commercial
39
MiniMaxAI
MiniMax M2
MiniMaxAI
78.00———Free commercial
40
xAI
Grok 3
xAI
77.10———Proprietary
41
Moonshot AI
Kimi K2 0905
Moonshot AI
75.20———Free commercial
42
Google Deep Mind
Gemini 2.5 Flash
Google Deep Mind
72.004.20——Proprietary
43
阿里巴巴
Qwen3-235B-A22B-2507
阿里巴巴
70.30———Free commercial
44
MistralAI
Magistral-Medium-2506
MistralAI
64.95———Proprietary
45
Google Deep Mind
Gemini 2.5 Flash-Lite
Google Deep Mind
63.10———Proprietary
46
百度
ERNIE-4.5-VL-424B-A47B-Base
百度
35.10———Free commercial
47
DeepMind
Gemini 2.0 Flash Experimental
DeepMind
29.70———Proprietary
48
Moonshot AI
Kimi K2 Thinking
Moonshot AI
100.00———Free commercial
49
Facebook AI研究实验室
Llama3.1-405B Instruct
Facebook AI研究实验室
————Free commercial
50
xAI
Grok 3.5
xAI
————Proprietary
Gemini-2.5-Pro-Preview-05-06
Google Deep Mind
AIME202583.00
FrontierMath - Tier 42.10
MATH-50098.80
GSM8K—
Proprietary
Claude Opus 4
Anthropic
AIME202575.50
FrontierMath - Tier 44.20
MATH-50098.20
GSM8K—
Proprietary
GLM-4.5
智谱AI
AIME2025—
FrontierMath - Tier 4—
MATH-50098.20
GSM8K—
Free commercial
4
GLM-4.5-Air
智谱AI
AIME2025—
FrontierMath - Tier 4—
MATH-50098.10
GSM8K—
Free commercial
5
DeepSeek-R1-0528
DeepSeek-AI
AIME202587.50
FrontierMath - Tier 4—
MATH-50098.00
GSM8K—
Free commercial
6
Qwen3-235B-A22B
阿里巴巴
AIME202581.50
FrontierMath - Tier 4—
MATH-50098.00
GSM8K96.40
Free commercial
7
OpenAI o3-mini (high)
OpenAI
AIME2025—
FrontierMath - Tier 44.20
MATH-50097.90
GSM8K—
Proprietary
8
Kimi K2
Moonshot AI
AIME202554.00
FrontierMath - Tier 40.01
MATH-50097.40
GSM8K—
Free commercial
9
DeepSeek-R1
DeepSeek-AI
AIME202570.00
FrontierMath - Tier 4—
MATH-50097.30
GSM8K—
Free commercial
10
MiniMax-M1-80k
MiniMaxAI
AIME202576.90
FrontierMath - Tier 4—
MATH-50096.80
GSM8K—
Free commercial
11
ERNIE-4.5-300B-A47B
百度
AIME202535.10
FrontierMath - Tier 4—
MATH-50096.40
GSM8K96.60
Free commercial
12
OpenAI o1
OpenAI
AIME2025—
FrontierMath - Tier 4—
MATH-50096.40
GSM8K—
Proprietary
13
Kimi k1.5 (Long-CoT)
普林斯顿大学
AIME2025—
FrontierMath - Tier 4—
MATH-50096.20
GSM8K—
Proprietary
14
Claude Sonnet 3.7-64K Extended Thinking
Anthropic
AIME2025—
FrontierMath - Tier 4—
MATH-50096.20
GSM8K—
Proprietary
15
MiniMax-M1-40k
MiniMaxAI
AIME202574.60
FrontierMath - Tier 4—
MATH-50096.00
GSM8K—
Free commercial
16
Llama 4 Behemoth Instruct
Facebook AI研究实验室
AIME2025—
FrontierMath - Tier 4—
MATH-50095.00
GSM8K—
Free commercial
17
Kimi k1.5 (Short-CoT)
Moonshot AI
AIME2025—
FrontierMath - Tier 4—
MATH-50094.60
GSM8K—
Proprietary
18
DeepSeek-V3-0324
DeepSeek-AI
AIME202547.70
FrontierMath - Tier 4—
MATH-50094.00
GSM8K96.30
Free commercial
19
GPT-4.1
OpenAI
AIME202536.70
FrontierMath - Tier 4—
MATH-50092.80
GSM8K95.90
Proprietary
20
GPT-4.5
OpenAI
AIME2025—
FrontierMath - Tier 4—
MATH-50090.70
GSM8K—
Proprietary
21
OpenAI o1-mini
OpenAI
AIME2025—
FrontierMath - Tier 4—
MATH-50090.00
GSM8K—
Proprietary
22
DeepSeek-V3
DeepSeek-AI
AIME2025—
FrontierMath - Tier 4—
MATH-50087.80
GSM8K—
Free commercial
23
Claude Sonnet 3.7
Anthropic
AIME202554.80
FrontierMath - Tier 4—
MATH-50082.20
GSM8K—
Proprietary
24
Step 3.5 Flash
StepFunAI
AIME202599.80
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
25
OpenAI o4 - mini
OpenAI
AIME202599.50
FrontierMath - Tier 46.30
MATH-500—
GSM8K—
Proprietary
26
GLM-4.6
智谱AI
AIME202598.60
FrontierMath - Tier 42.10
MATH-500—
GSM8K—
Free commercial
27
Kimi K2.5
Moonshot AI
AIME202596.10
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Free commercial
28
GLM-4.7
智谱AI
AIME202595.70
FrontierMath - Tier 42.10
MATH-500—
GSM8K—
Free commercial
29
DeepSeek V3.2
DeepSeek-AI
AIME202593.10
FrontierMath - Tier 42.10
MATH-500—
GSM8K—
Free commercial
30
o3-pro
OpenAI
AIME202593.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
31
Qwen3-235B-A22B-Thinking-2507
阿里巴巴
AIME202592.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
32
DeepSeek-V3.1 Terminus
DeepSeek-AI
AIME202590.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
33
DeepSeek V3.2-Exp
DeepSeek-AI
AIME202589.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
34
DeepSeek-V3.1
DeepSeek-AI
AIME202588.40
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
35
MiniMax M2.5
MiniMaxAI
AIME202586.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
36
Intern-S1
上海人工智能实验室
AIME202586.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
37
Step3
StepFunAI
AIME202582.90
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
38
M2.1
MiniMaxAI
AIME202581.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
39
MiniMax M2
MiniMaxAI
AIME202578.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
40
Grok 3
xAI
AIME202577.10
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
41
Kimi K2 0905
Moonshot AI
AIME202575.20
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
42
Gemini 2.5 Flash
Google Deep Mind
AIME202572.00
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Proprietary
43
Qwen3-235B-A22B-2507
阿里巴巴
AIME202570.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
44
Magistral-Medium-2506
MistralAI
AIME202564.95
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
45
Gemini 2.5 Flash-Lite
Google Deep Mind
AIME202563.10
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
46
ERNIE-4.5-VL-424B-A47B-Base
百度
AIME202535.10
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
47
Gemini 2.0 Flash Experimental
DeepMind
AIME202529.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
48
Kimi K2 Thinking
Moonshot AI
AIME2025100.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
49
Llama3.1-405B Instruct
Facebook AI研究实验室
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
50
Grok 3.5
xAI
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
Sort by:
Showing 50 of 55 modelsView MATH-500 benchmark page