DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Math Reasoning Benchmark Leaderboard

This page provides the most comprehensive LLM math reasoning benchmark leaderboard. We evaluate models including GPT, Claude, Qwen, and DeepSeek using authoritative math benchmarks such as AIME 2025, FrontierMath-Tier4, MATH-500, and GSM8K.

Updated on 2026-05-02 07:14:49

As of 2026-05, this page covers AIME2025, FrontierMath - Tier 4, MATH-500, GSM8K and related benchmarks for LLM Math Reasoning Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Benchmark
AIME2025FrontierMath - Tier 4MATH-500GSM8K
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff:

LLM Performance Results

Data source: DataLearnerAI
RankModelLicense
Moonshot AI
Kimi K2 Thinking
Parallel · Thinking EnabledTools
Moonshot AI
100.00———Free commercial
StepFunAI
Step 3.5 Flash
Thinking EnabledTools
StepFunAI
99.80———Free commercial
Moonshot AI
Kimi K2 Thinking
Thinking EnabledTools
Moonshot AI
99.10———Free commercial
4
智谱AI
GLM-4.6
Thinking Enabled
智谱AI
98.60———Free commercial
5
智谱AI
GLM-4.6
Thinking EnabledTools
智谱AI
98.60———Free commercial
6
StepFunAI
Step 3.5 Flash
Thinking Enabled
StepFunAI
97.30———Free commercial
7
Moonshot AI
Kimi K2.5
Thinking Enabled
Moonshot AI
96.10———Free commercial
8
DeepSeek-AI
DeepSeek V3.2 Speciale
Thinking Enabled
DeepSeek-AI
96.00———Free commercial
9
智谱AI
GLM-4.7
Thinking Enabled
智谱AI
95.70———Free commercial
10
Moonshot AI
Kimi K2 Thinking
Thinking Enabled
Moonshot AI
94.50———Free commercial
11
DeepSeek-AI
DeepSeek V3.2
Thinking Enabled
DeepSeek-AI
93.102.10——Free commercial
12
阿里巴巴
Qwen3-235B-A22B-Thinking
Thinking Enabled
阿里巴巴
92.30———Free commercial
13
阿里巴巴
Qwen3-235B-A22B-Thinking-2507
Thinking Enabled
阿里巴巴
92.30———Free commercial
14
智谱AI
GLM-4.7-Flash
Thinking Enabled
智谱AI
91.60———Free commercial
15
DeepSeek-AI
DeepSeek-V3.1 Terminus
Thinking Enabled
DeepSeek-AI
90.00———Free commercial
16
DeepSeek-AI
DeepSeek V3.2-Exp
Thinking Enabled
DeepSeek-AI
89.30———Free commercial
17
DeepSeek-AI
DeepSeek-V3.1
Thinking Enabled
DeepSeek-AI
88.40———Free commercial
18
DeepSeek-AI
DeepSeek-R1-0528
Thinking Enabled
DeepSeek-AI
87.50—98.00—Free commercial
19
MiniMaxAI
MiniMax M2.5
Thinking Enabled
MiniMaxAI
86.30———Free commercial
20
上海人工智能实验室
Intern-S1
上海人工智能实验室
86.00———Free commercial
21
StepFunAI
Step3
StepFunAI
82.90———Free commercial
22
阿里巴巴
Qwen3-235B-A22B
Thinking Enabled
阿里巴巴
81.50—98.00—Free commercial
23
阿里巴巴
Qwen3-4B-Thinking-2507
Thinking Enabled
阿里巴巴
81.30———Free commercial
24
MiniMaxAI
M2.1
Thinking Enabled
MiniMaxAI
81.00———Free commercial
25
阿里巴巴
Qwen3 Max (Preview)
阿里巴巴
80.60———Proprietary
26
MiniMaxAI
MiniMax M2
Thinking Enabled
MiniMaxAI
78.00———Free commercial
27
MiniMaxAI
MiniMax-M1-80k
MiniMaxAI
76.90—96.80—Free commercial
28
腾讯AI实验室
Hunyuan-A13B-Instruct
腾讯AI实验室
76.80——91.83Free commercial
29
Tencent ARC
Hunyuan-7B
Tencent ARC
75.30—93.70—Free commercial
30
Moonshot AI
Kimi K2 0905
Thinking EnabledTools
Moonshot AI
75.20———Free commercial
31
MiniMaxAI
MiniMax-M1-40k
MiniMaxAI
74.60—96.00—Free commercial
32
阿里巴巴
Qwen3-32B
Thinking Enabled
阿里巴巴
72.90—97.20—Free commercial
33
阿里巴巴
Qwen3-235B-A22B-2507
阿里巴巴
70.30———Free commercial
34
DeepSeek-AI
DeepSeek-R1
DeepSeek-AI
70.00—97.30—Free commercial
35
阿里巴巴
Qwen3-Next
阿里巴巴
69.50——90.30Free commercial
36
华为
Pangu Pro MoE
华为
68.10—96.80—Free commercial
37
阿里巴巴
Qwen3-8B
Thinking Enabled
阿里巴巴
67.30—97.40—Free commercial
38
阿里巴巴
Qwen3-30B-A3B-2507
阿里巴巴
61.30———Free commercial
39
DeepSeek-AI
DeepSeek V3.2-Exp
DeepSeek-AI
58.00———Free commercial
40
DeepSeek-AI
DeepSeek-V3.1 Terminus
DeepSeek-AI
54.00———Free commercial
41
Moonshot AI
Kimi K2
Moonshot AI
54.000.0197.40—Free commercial
42
DeepSeek-AI
DeepSeek-V3.1
DeepSeek-AI
49.80———Free commercial
43
DeepSeek-AI
DeepSeek-V3-0324
DeepSeek-AI
47.70—94.0096.30Free commercial
44
阿里巴巴
Qwen3-4B-2507
阿里巴巴
47.40———Free commercial
45
智谱AI
GLM-4.6
智谱AI
44.00———Free commercial
46
百度
ERNIE-4.5-300B-A47B
百度
35.10—96.4096.60Free commercial
47
百度
ERNIE-4.5-VL-424B-A47B-Base
Thinking Enabled
百度
35.10———Free commercial
48
阿里巴巴
Qwen3-235B-A22B
阿里巴巴
24.70—96.2096.40Free commercial
49
阿里巴巴
Qwen3-30B-A3B
阿里巴巴
21.60———Free commercial
50
阿里巴巴
Qwen3-8B
阿里巴巴
20.90—87.40—Free commercial
Kimi K2 Thinking
Moonshot AI
Parallel · Thinking EnabledTools
AIME2025100.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
Step 3.5 Flash
StepFunAI
Thinking EnabledTools
AIME202599.80
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
Kimi K2 Thinking
Moonshot AI
Thinking EnabledTools
AIME202599.10
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
4
GLM-4.6
智谱AI
Thinking Enabled
AIME202598.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
5
GLM-4.6
智谱AI
Thinking EnabledTools
AIME202598.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
6
Step 3.5 Flash
StepFunAI
Thinking Enabled
AIME202597.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
7
Kimi K2.5
Moonshot AI
Thinking Enabled
AIME202596.10
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
8
DeepSeek V3.2 Speciale
DeepSeek-AI
Thinking Enabled
AIME202596.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
9
GLM-4.7
智谱AI
Thinking Enabled
AIME202595.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
10
Kimi K2 Thinking
Moonshot AI
Thinking Enabled
AIME202594.50
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
11
DeepSeek V3.2
DeepSeek-AI
Thinking Enabled
AIME202593.10
FrontierMath - Tier 42.10
MATH-500—
GSM8K—
Free commercial
12
Qwen3-235B-A22B-Thinking
阿里巴巴
Thinking Enabled
AIME202592.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
13
Qwen3-235B-A22B-Thinking-2507
阿里巴巴
Thinking Enabled
AIME202592.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
14
GLM-4.7-Flash
智谱AI
Thinking Enabled
AIME202591.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
15
DeepSeek-V3.1 Terminus
DeepSeek-AI
Thinking Enabled
AIME202590.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
16
DeepSeek V3.2-Exp
DeepSeek-AI
Thinking Enabled
AIME202589.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
17
DeepSeek-V3.1
DeepSeek-AI
Thinking Enabled
AIME202588.40
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
18
DeepSeek-R1-0528
DeepSeek-AI
Thinking Enabled
AIME202587.50
FrontierMath - Tier 4—
MATH-50098.00
GSM8K—
Free commercial
19
MiniMax M2.5
MiniMaxAI
Thinking Enabled
AIME202586.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
20
Intern-S1
上海人工智能实验室
AIME202586.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
21
Step3
StepFunAI
AIME202582.90
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
22
Qwen3-235B-A22B
阿里巴巴
Thinking Enabled
AIME202581.50
FrontierMath - Tier 4—
MATH-50098.00
GSM8K—
Free commercial
23
Qwen3-4B-Thinking-2507
阿里巴巴
Thinking Enabled
AIME202581.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
24
M2.1
MiniMaxAI
Thinking Enabled
AIME202581.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
25
Qwen3 Max (Preview)
阿里巴巴
AIME202580.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
26
MiniMax M2
MiniMaxAI
Thinking Enabled
AIME202578.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
27
MiniMax-M1-80k
MiniMaxAI
AIME202576.90
FrontierMath - Tier 4—
MATH-50096.80
GSM8K—
Free commercial
28
Hunyuan-A13B-Instruct
腾讯AI实验室
AIME202576.80
FrontierMath - Tier 4—
MATH-500—
GSM8K91.83
Free commercial
29
Hunyuan-7B
Tencent ARC
AIME202575.30
FrontierMath - Tier 4—
MATH-50093.70
GSM8K—
Free commercial
30
Kimi K2 0905
Moonshot AI
Thinking EnabledTools
AIME202575.20
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
31
MiniMax-M1-40k
MiniMaxAI
AIME202574.60
FrontierMath - Tier 4—
MATH-50096.00
GSM8K—
Free commercial
32
Qwen3-32B
阿里巴巴
Thinking Enabled
AIME202572.90
FrontierMath - Tier 4—
MATH-50097.20
GSM8K—
Free commercial
33
Qwen3-235B-A22B-2507
阿里巴巴
AIME202570.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
34
DeepSeek-R1
DeepSeek-AI
AIME202570.00
FrontierMath - Tier 4—
MATH-50097.30
GSM8K—
Free commercial
35
Qwen3-Next
阿里巴巴
AIME202569.50
FrontierMath - Tier 4—
MATH-500—
GSM8K90.30
Free commercial
36
Pangu Pro MoE
华为
AIME202568.10
FrontierMath - Tier 4—
MATH-50096.80
GSM8K—
Free commercial
37
Qwen3-8B
阿里巴巴
Thinking Enabled
AIME202567.30
FrontierMath - Tier 4—
MATH-50097.40
GSM8K—
Free commercial
38
Qwen3-30B-A3B-2507
阿里巴巴
AIME202561.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
39
DeepSeek V3.2-Exp
DeepSeek-AI
AIME202558.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
40
DeepSeek-V3.1 Terminus
DeepSeek-AI
AIME202554.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
41
Kimi K2
Moonshot AI
AIME202554.00
FrontierMath - Tier 40.01
MATH-50097.40
GSM8K—
Free commercial
42
DeepSeek-V3.1
DeepSeek-AI
AIME202549.80
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
43
DeepSeek-V3-0324
DeepSeek-AI
AIME202547.70
FrontierMath - Tier 4—
MATH-50094.00
GSM8K96.30
Free commercial
44
Qwen3-4B-2507
阿里巴巴
AIME202547.40
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
45
GLM-4.6
智谱AI
AIME202544.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
46
ERNIE-4.5-300B-A47B
百度
AIME202535.10
FrontierMath - Tier 4—
MATH-50096.40
GSM8K96.60
Free commercial
47
ERNIE-4.5-VL-424B-A47B-Base
百度
Thinking Enabled
AIME202535.10
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
48
Qwen3-235B-A22B
阿里巴巴
AIME202524.70
FrontierMath - Tier 4—
MATH-50096.20
GSM8K96.40
Free commercial
49
Qwen3-30B-A3B
阿里巴巴
AIME202521.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
50
Qwen3-8B
阿里巴巴
AIME202520.90
FrontierMath - Tier 4—
MATH-50087.40
GSM8K—
Free commercial
Sort by:
Showing 50 of 73 modelsView AIME2025 benchmark page