DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Math Reasoning Benchmark Leaderboard

This page provides the most comprehensive LLM math reasoning benchmark leaderboard. We evaluate models including GPT, Claude, Qwen, and DeepSeek using authoritative math benchmarks such as AIME 2025, FrontierMath-Tier4, MATH-500, and GSM8K.

Updated on 2026-05-02 07:14:49

As of 2026-05, this page covers AIME2025, FrontierMath - Tier 4, MATH-500, GSM8K and related benchmarks for LLM Math Reasoning Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Benchmark
AIME2025FrontierMath - Tier 4MATH-500GSM8K
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff:

LLM Performance Results

Data source: DataLearnerAI
RankModelLicense
Google Deep Mind
Gemini-2.5-Pro-Preview-05-06
Google Deep Mind
83.002.1098.80—Proprietary
Google Deep Mind
Gemini 2.5-Pro
Google Deep Mind
——98.80—Proprietary
Anthropic
Claude Opus 4
Anthropic
75.50—98.20—Proprietary
4
智谱AI
GLM-4.5
Thinking Enabled
智谱AI
——98.20—Free commercial
5
OpenAI
OpenAI o3
OpenAI
——98.10—Proprietary
6
智谱AI
GLM-4.5-Air
Thinking Enabled
智谱AI
——98.10—Free commercial
7
DeepSeek-AI
DeepSeek-R1-0528
Thinking Enabled
DeepSeek-AI
87.50—98.00—Free commercial
8
阿里巴巴
Qwen3-235B-A22B
Thinking Enabled
阿里巴巴
81.50—98.00—Free commercial
9
OpenAI
OpenAI o3-mini (high)
OpenAI
——97.90—Proprietary
10
Anthropic
Claude Opus 4.6
Extended Thinking
Anthropic
99.79—97.60—Proprietary
11
阿里巴巴
Qwen3-8B
Thinking Enabled
阿里巴巴
67.30—97.40—Free commercial
12
Moonshot AI
Kimi K2
Moonshot AI
54.000.0197.40—Free commercial
13
DeepSeek-AI
DeepSeek-R1
DeepSeek-AI
70.00—97.30—Free commercial
14
阿里巴巴
Qwen3-32B
Thinking Enabled
阿里巴巴
72.90—97.20—Free commercial
15
MiniMaxAI
MiniMax-M1-80k
MiniMaxAI
76.90—96.80—Free commercial
16
华为
Pangu Pro MoE
华为
68.10—96.80—Free commercial
17
百度
ERNIE-4.5-300B-A47B
百度
35.10—96.4096.60Free commercial
18
OpenAI
OpenAI o1
OpenAI
——96.40—Proprietary
19
阿里巴巴
Qwen3-235B-A22B
阿里巴巴
24.70—96.2096.40Free commercial
20
Anthropic
Claude Sonnet 3.7-64K Extended Thinking
Anthropic
——96.20—Proprietary
21
普林斯顿大学
Kimi k1.5 (Long-CoT)
普林斯顿大学
——96.20—Proprietary
22
腾讯AI实验室
Hunyuan-T1
腾讯AI实验室
——96.20—Proprietary
23
MiniMaxAI
MiniMax-M1-40k
MiniMaxAI
74.60—96.00—Free commercial
24
OpenAI
OpenAI o3-mini
Thinking Enabled
OpenAI
86.50—95.80—Proprietary
25
Facebook AI研究实验室
Llama 4 Behemoth Instruct
Facebook AI研究实验室
——95.00—Free commercial
26
Moonshot AI
Kimi k1.5 (Short-CoT)
Moonshot AI
——94.60—Proprietary
27
DeepSeek-AI
DeepSeek-R1-Distill-Llama-70B
DeepSeek-AI
——94.50—Free commercial
28
DeepSeek-AI
DeepSeek-V3-0324
DeepSeek-AI
47.70—94.0096.30Free commercial
29
Tencent ARC
Hunyuan-7B
Tencent ARC
75.30—93.70—Free commercial
30
OpenAI
GPT-4.1
OpenAI
36.70—92.8095.90Proprietary
31
华为
Pangu Embedded
华为
——92.4095.98Free commercial
32
DeepSeek-AI
DeepSeek-R1-Distill-Qwen-7B
DeepSeek-AI
——91.40—Free commercial
33
阿里巴巴
QwQ-32B
阿里巴巴
——91.00—Free commercial
34
OpenAI
GPT-4.5
OpenAI
——90.70—Proprietary
35
阿里巴巴
QwQ-32B-Preview
阿里巴巴
——90.60—Free commercial
36
Microsoft Azure
Phi-4-instruct (reasoning-trained)
Microsoft Azure
——90.40—Proprietary
37
OpenAI
OpenAI o1-mini
OpenAI
——90.00—Proprietary
38
阿里巴巴
Qwen3-32B
阿里巴巴
20.20—88.60—Free commercial
39
DeepSeek-AI
DeepSeek-V3
DeepSeek-AI
——87.80—Free commercial
40
阿里巴巴
Qwen3-8B
阿里巴巴
20.90—87.40—Free commercial
41
Anthropic
Claude Sonnet 3.7
Anthropic
54.80—82.20—Proprietary
42
Anthropic
Claude 3.5 Sonnet New
Anthropic
——78.00—Proprietary
43
OpenAI
GPT-4o
OpenAI
——75.90—Proprietary
44
Microsoft Azure
Phi-4-mini-instruct (3.8B)
Microsoft Azure
——71.8088.60Free commercial
45
StepFunAI
Step 3.5 Flash
Thinking EnabledTools
StepFunAI
99.80———Free commercial
46
Google Deep Mind
Gemini 3.0 Flash
Thinking EnabledTools
Google Deep Mind
99.70———Proprietary
47
OpenAI
GPT-5
Thinking EnabledTools
OpenAI
99.60———Proprietary
48
OpenAI
OpenAI o4 - mini
Thinking EnabledTools
OpenAI
99.50———Proprietary
49
Google Deep Mind
Gemini 2.5 Deep Think
Deep Thinking Mode
Google Deep Mind
99.20———Proprietary
50
Moonshot AI
Kimi K2 Thinking
Thinking EnabledTools
Moonshot AI
99.10———Free commercial
Gemini-2.5-Pro-Preview-05-06
Google Deep Mind
AIME202583.00
FrontierMath - Tier 42.10
MATH-50098.80
GSM8K—
Proprietary
Gemini 2.5-Pro
Google Deep Mind
AIME2025—
FrontierMath - Tier 4—
MATH-50098.80
GSM8K—
Proprietary
Claude Opus 4
Anthropic
AIME202575.50
FrontierMath - Tier 4—
MATH-50098.20
GSM8K—
Proprietary
4
GLM-4.5
智谱AI
Thinking Enabled
AIME2025—
FrontierMath - Tier 4—
MATH-50098.20
GSM8K—
Free commercial
5
OpenAI o3
OpenAI
AIME2025—
FrontierMath - Tier 4—
MATH-50098.10
GSM8K—
Proprietary
6
GLM-4.5-Air
智谱AI
Thinking Enabled
AIME2025—
FrontierMath - Tier 4—
MATH-50098.10
GSM8K—
Free commercial
7
DeepSeek-R1-0528
DeepSeek-AI
Thinking Enabled
AIME202587.50
FrontierMath - Tier 4—
MATH-50098.00
GSM8K—
Free commercial
8
Qwen3-235B-A22B
阿里巴巴
Thinking Enabled
AIME202581.50
FrontierMath - Tier 4—
MATH-50098.00
GSM8K—
Free commercial
9
OpenAI o3-mini (high)
OpenAI
AIME2025—
FrontierMath - Tier 4—
MATH-50097.90
GSM8K—
Proprietary
10
Claude Opus 4.6
Anthropic
Extended Thinking
AIME202599.79
FrontierMath - Tier 4—
MATH-50097.60
GSM8K—
Proprietary
11
Qwen3-8B
阿里巴巴
Thinking Enabled
AIME202567.30
FrontierMath - Tier 4—
MATH-50097.40
GSM8K—
Free commercial
12
Kimi K2
Moonshot AI
AIME202554.00
FrontierMath - Tier 40.01
MATH-50097.40
GSM8K—
Free commercial
13
DeepSeek-R1
DeepSeek-AI
AIME202570.00
FrontierMath - Tier 4—
MATH-50097.30
GSM8K—
Free commercial
14
Qwen3-32B
阿里巴巴
Thinking Enabled
AIME202572.90
FrontierMath - Tier 4—
MATH-50097.20
GSM8K—
Free commercial
15
MiniMax-M1-80k
MiniMaxAI
AIME202576.90
FrontierMath - Tier 4—
MATH-50096.80
GSM8K—
Free commercial
16
Pangu Pro MoE
华为
AIME202568.10
FrontierMath - Tier 4—
MATH-50096.80
GSM8K—
Free commercial
17
ERNIE-4.5-300B-A47B
百度
AIME202535.10
FrontierMath - Tier 4—
MATH-50096.40
GSM8K96.60
Free commercial
18
OpenAI o1
OpenAI
AIME2025—
FrontierMath - Tier 4—
MATH-50096.40
GSM8K—
Proprietary
19
Qwen3-235B-A22B
阿里巴巴
AIME202524.70
FrontierMath - Tier 4—
MATH-50096.20
GSM8K96.40
Free commercial
20
Claude Sonnet 3.7-64K Extended Thinking
Anthropic
AIME2025—
FrontierMath - Tier 4—
MATH-50096.20
GSM8K—
Proprietary
21
Kimi k1.5 (Long-CoT)
普林斯顿大学
AIME2025—
FrontierMath - Tier 4—
MATH-50096.20
GSM8K—
Proprietary
22
Hunyuan-T1
腾讯AI实验室
AIME2025—
FrontierMath - Tier 4—
MATH-50096.20
GSM8K—
Proprietary
23
MiniMax-M1-40k
MiniMaxAI
AIME202574.60
FrontierMath - Tier 4—
MATH-50096.00
GSM8K—
Free commercial
24
OpenAI o3-mini
OpenAI
Thinking Enabled
AIME202586.50
FrontierMath - Tier 4—
MATH-50095.80
GSM8K—
Proprietary
25
Llama 4 Behemoth Instruct
Facebook AI研究实验室
AIME2025—
FrontierMath - Tier 4—
MATH-50095.00
GSM8K—
Free commercial
26
Kimi k1.5 (Short-CoT)
Moonshot AI
AIME2025—
FrontierMath - Tier 4—
MATH-50094.60
GSM8K—
Proprietary
27
DeepSeek-R1-Distill-Llama-70B
DeepSeek-AI
AIME2025—
FrontierMath - Tier 4—
MATH-50094.50
GSM8K—
Free commercial
28
DeepSeek-V3-0324
DeepSeek-AI
AIME202547.70
FrontierMath - Tier 4—
MATH-50094.00
GSM8K96.30
Free commercial
29
Hunyuan-7B
Tencent ARC
AIME202575.30
FrontierMath - Tier 4—
MATH-50093.70
GSM8K—
Free commercial
30
GPT-4.1
OpenAI
AIME202536.70
FrontierMath - Tier 4—
MATH-50092.80
GSM8K95.90
Proprietary
31
Pangu Embedded
华为
AIME2025—
FrontierMath - Tier 4—
MATH-50092.40
GSM8K95.98
Free commercial
32
DeepSeek-R1-Distill-Qwen-7B
DeepSeek-AI
AIME2025—
FrontierMath - Tier 4—
MATH-50091.40
GSM8K—
Free commercial
33
QwQ-32B
阿里巴巴
AIME2025—
FrontierMath - Tier 4—
MATH-50091.00
GSM8K—
Free commercial
34
GPT-4.5
OpenAI
AIME2025—
FrontierMath - Tier 4—
MATH-50090.70
GSM8K—
Proprietary
35
QwQ-32B-Preview
阿里巴巴
AIME2025—
FrontierMath - Tier 4—
MATH-50090.60
GSM8K—
Free commercial
36
Phi-4-instruct (reasoning-trained)
Microsoft Azure
AIME2025—
FrontierMath - Tier 4—
MATH-50090.40
GSM8K—
Proprietary
37
OpenAI o1-mini
OpenAI
AIME2025—
FrontierMath - Tier 4—
MATH-50090.00
GSM8K—
Proprietary
38
Qwen3-32B
阿里巴巴
AIME202520.20
FrontierMath - Tier 4—
MATH-50088.60
GSM8K—
Free commercial
39
DeepSeek-V3
DeepSeek-AI
AIME2025—
FrontierMath - Tier 4—
MATH-50087.80
GSM8K—
Free commercial
40
Qwen3-8B
阿里巴巴
AIME202520.90
FrontierMath - Tier 4—
MATH-50087.40
GSM8K—
Free commercial
41
Claude Sonnet 3.7
Anthropic
AIME202554.80
FrontierMath - Tier 4—
MATH-50082.20
GSM8K—
Proprietary
42
Claude 3.5 Sonnet New
Anthropic
AIME2025—
FrontierMath - Tier 4—
MATH-50078.00
GSM8K—
Proprietary
43
GPT-4o
OpenAI
AIME2025—
FrontierMath - Tier 4—
MATH-50075.90
GSM8K—
Proprietary
44
Phi-4-mini-instruct (3.8B)
Microsoft Azure
AIME2025—
FrontierMath - Tier 4—
MATH-50071.80
GSM8K88.60
Free commercial
45
Step 3.5 Flash
StepFunAI
Thinking EnabledTools
AIME202599.80
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
46
Gemini 3.0 Flash
Google Deep Mind
Thinking EnabledTools
AIME202599.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
47
GPT-5
OpenAI
Thinking EnabledTools
AIME202599.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
48
OpenAI o4 - mini
OpenAI
Thinking EnabledTools
AIME202599.50
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
49
Gemini 2.5 Deep Think
Google Deep Mind
Deep Thinking Mode
AIME202599.20
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
50
Kimi K2 Thinking
Moonshot AI
Thinking EnabledTools
AIME202599.10
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
Sort by:
Showing 50 of 221 modelsView MATH-500 benchmark page