DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Math Reasoning Benchmark Leaderboard

This page provides the most comprehensive LLM math reasoning benchmark leaderboard. We evaluate models including GPT, Claude, Qwen, and DeepSeek using authoritative math benchmarks such as AIME 2025, FrontierMath-Tier4, MATH-500, and GSM8K.

Updated on 2026-05-02 07:14:49

As of 2026-05, this page covers AIME2025, FrontierMath - Tier 4, MATH-500, GSM8K and related benchmarks for LLM Math Reasoning Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Benchmark
AIME2025FrontierMath - Tier 4MATH-500GSM8K
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff:

LLM Performance Results

Data source: DataLearnerAI
RankModelLicense
Moonshot AI
Kimi K2 Thinking
Parallel · Thinking EnabledTools
Moonshot AI
100.00———Free commercial
Anthropic
Claude Sonnet 4.5
Thinking EnabledTools
Anthropic
100.00———Proprietary
OpenAI
GPT-5-Pro
Thinking EnabledTools
OpenAI
100.00———Proprietary
4
xAI
Grok 4 Heavy
Parallel · Thinking Enabled
xAI
100.00———Proprietary
5
OpenAI
GPT-5.2
Thinking Level · Extra High
OpenAI
100.00———Proprietary
6
StepFunAI
Step 3.5 Flash
Thinking EnabledTools
StepFunAI
99.80———Free commercial
7
Anthropic
Claude Opus 4.6
Extended Thinking
Anthropic
99.79—97.60—Proprietary
8
Google Deep Mind
Gemini 3.0 Flash
Thinking EnabledTools
Google Deep Mind
99.70———Proprietary
9
OpenAI
GPT-5
Thinking EnabledTools
OpenAI
99.60———Proprietary
10
OpenAI
OpenAI o4 - mini
Thinking EnabledTools
OpenAI
99.50———Proprietary
11
Google Deep Mind
Gemini 2.5 Deep Think
Deep Thinking Mode
Google Deep Mind
99.20———Proprietary
12
Moonshot AI
Kimi K2 Thinking
Thinking EnabledTools
Moonshot AI
99.10———Free commercial
13
xAI
Grok 4
Thinking EnabledTools
xAI
98.80———Proprietary
14
OpenAI
GPT OSS 20B
Thinking EnabledTools
OpenAI
98.70———Free commercial
15
智谱AI
GLM-4.6
Thinking Enabled
智谱AI
98.60———Free commercial
16
智谱AI
GLM-4.6
Thinking EnabledTools
智谱AI
98.60———Free commercial
17
OpenAI
GPT OSS 120B
Thinking EnabledTools
OpenAI
97.90———Free commercial
18
StepFunAI
Step 3.5 Flash
Thinking Enabled
StepFunAI
97.30———Free commercial
19
OpenAI
GPT-5-Pro
Thinking Enabled
OpenAI
96.7014.60——Proprietary
20
Anthropic
Haiku 4.5
Thinking EnabledTools
Anthropic
96.30———Proprietary
21
Moonshot AI
Kimi K2.5
Thinking Enabled
Moonshot AI
96.10———Free commercial
22
DeepSeek-AI
DeepSeek V3.2 Speciale
Thinking Enabled
DeepSeek-AI
96.00———Free commercial
23
智谱AI
GLM-4.7
Thinking Enabled
智谱AI
95.70———Free commercial
24
Google Deep Mind
Gemini 3.0 Flash
Thinking Enabled
Google Deep Mind
95.20———Proprietary
25
Google Deep Mind
Gemini 3.0 Pro (Preview 11-2025)
Thinking Enabled
Google Deep Mind
95.0018.80——Proprietary
26
OpenAI
GPT-5
Thinking Enabled
OpenAI
94.60———Proprietary
27
Moonshot AI
Kimi K2 Thinking
Thinking Enabled
Moonshot AI
94.50———Free commercial
28
OpenAI
GPT-5.1
Thinking Level · High
OpenAI
94.00———Proprietary
29
OpenAI
GPT-5.1
Thinking Level · High
OpenAI
94.00———Proprietary
30
DeepSeek-AI
DeepSeek V3.2
Thinking Enabled
DeepSeek-AI
93.102.10——Free commercial
31
OpenAI
o3-pro
OpenAI
93.00———Proprietary
32
OpenAI
OpenAI o4 - mini
Thinking Enabled
OpenAI
92.70———Proprietary
33
阿里巴巴
Qwen3-235B-A22B-Thinking
Thinking Enabled
阿里巴巴
92.30———Free commercial
34
阿里巴巴
Qwen3-235B-A22B-Thinking-2507
Thinking Enabled
阿里巴巴
92.30———Free commercial
35
xAI
Grok 4 Fast
Thinking Enabled
xAI
92.00———Proprietary
36
xAI
Grok 4
Thinking Enabled
xAI
91.70———Proprietary
37
智谱AI
GLM-4.7-Flash
Thinking Enabled
智谱AI
91.60———Free commercial
38
DeepSeek-AI
DeepSeek-V3.1 Terminus
Thinking Enabled
DeepSeek-AI
90.00———Free commercial
39
DeepSeek-AI
DeepSeek V3.2-Exp
Thinking Enabled
DeepSeek-AI
89.30———Free commercial
40
xAI
Grok 4.1 Fast
Thinking Enabled
xAI
89.00———Proprietary
41
OpenAI
OpenAI o3
Thinking Enabled
OpenAI
88.90———Proprietary
42
DeepSeek-AI
DeepSeek-V3.1
Thinking Enabled
DeepSeek-AI
88.40———Free commercial
43
Google Deep Mind
Gemini 2.5-Pro
Thinking Enabled
Google Deep Mind
88.00———Proprietary
44
DeepSeek-AI
DeepSeek-R1-0528
Thinking Enabled
DeepSeek-AI
87.50—98.00—Free commercial
45
Anthropic
Claude Sonnet 4.5
Thinking Enabled
Anthropic
87.00———Proprietary
46
Google Deep Mind
Gemini 2.5 Pro Experimental 03-25
Google Deep Mind
86.90———Proprietary
47
OpenAI
OpenAI o3-mini
Thinking Enabled
OpenAI
86.50—95.80—Proprietary
48
MiniMaxAI
MiniMax M2.5
Thinking Enabled
MiniMaxAI
86.30———Free commercial
49
上海人工智能实验室
Intern-S1
上海人工智能实验室
86.00———Free commercial
50
Anthropic
Claude Sonnet 4
Deep Thinking ModeTools
Anthropic
85.00———Proprietary
Kimi K2 Thinking
Moonshot AI
Parallel · Thinking EnabledTools
AIME2025100.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
Claude Sonnet 4.5
Anthropic
Thinking EnabledTools
AIME2025100.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
GPT-5-Pro
OpenAI
Thinking EnabledTools
AIME2025100.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
4
Grok 4 Heavy
xAI
Parallel · Thinking Enabled
AIME2025100.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
5
GPT-5.2
OpenAI
Thinking Level · Extra High
AIME2025100.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
6
Step 3.5 Flash
StepFunAI
Thinking EnabledTools
AIME202599.80
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
7
Claude Opus 4.6
Anthropic
Extended Thinking
AIME202599.79
FrontierMath - Tier 4—
MATH-50097.60
GSM8K—
Proprietary
8
Gemini 3.0 Flash
Google Deep Mind
Thinking EnabledTools
AIME202599.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
9
GPT-5
OpenAI
Thinking EnabledTools
AIME202599.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
10
OpenAI o4 - mini
OpenAI
Thinking EnabledTools
AIME202599.50
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
11
Gemini 2.5 Deep Think
Google Deep Mind
Deep Thinking Mode
AIME202599.20
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
12
Kimi K2 Thinking
Moonshot AI
Thinking EnabledTools
AIME202599.10
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
13
Grok 4
xAI
Thinking EnabledTools
AIME202598.80
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
14
GPT OSS 20B
OpenAI
Thinking EnabledTools
AIME202598.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
15
GLM-4.6
智谱AI
Thinking Enabled
AIME202598.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
16
GLM-4.6
智谱AI
Thinking EnabledTools
AIME202598.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
17
GPT OSS 120B
OpenAI
Thinking EnabledTools
AIME202597.90
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
18
Step 3.5 Flash
StepFunAI
Thinking Enabled
AIME202597.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
19
GPT-5-Pro
OpenAI
Thinking Enabled
AIME202596.70
FrontierMath - Tier 414.60
MATH-500—
GSM8K—
Proprietary
20
Haiku 4.5
Anthropic
Thinking EnabledTools
AIME202596.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
21
Kimi K2.5
Moonshot AI
Thinking Enabled
AIME202596.10
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
22
DeepSeek V3.2 Speciale
DeepSeek-AI
Thinking Enabled
AIME202596.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
23
GLM-4.7
智谱AI
Thinking Enabled
AIME202595.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
24
Gemini 3.0 Flash
Google Deep Mind
Thinking Enabled
AIME202595.20
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
25
Gemini 3.0 Pro (Preview 11-2025)
Google Deep Mind
Thinking Enabled
AIME202595.00
FrontierMath - Tier 418.80
MATH-500—
GSM8K—
Proprietary
26
GPT-5
OpenAI
Thinking Enabled
AIME202594.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
27
Kimi K2 Thinking
Moonshot AI
Thinking Enabled
AIME202594.50
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
28
GPT-5.1
OpenAI
Thinking Level · High
AIME202594.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
29
GPT-5.1
OpenAI
Thinking Level · High
AIME202594.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
30
DeepSeek V3.2
DeepSeek-AI
Thinking Enabled
AIME202593.10
FrontierMath - Tier 42.10
MATH-500—
GSM8K—
Free commercial
31
o3-pro
OpenAI
AIME202593.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
32
OpenAI o4 - mini
OpenAI
Thinking Enabled
AIME202592.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
33
Qwen3-235B-A22B-Thinking
阿里巴巴
Thinking Enabled
AIME202592.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
34
Qwen3-235B-A22B-Thinking-2507
阿里巴巴
Thinking Enabled
AIME202592.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
35
Grok 4 Fast
xAI
Thinking Enabled
AIME202592.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
36
Grok 4
xAI
Thinking Enabled
AIME202591.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
37
GLM-4.7-Flash
智谱AI
Thinking Enabled
AIME202591.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
38
DeepSeek-V3.1 Terminus
DeepSeek-AI
Thinking Enabled
AIME202590.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
39
DeepSeek V3.2-Exp
DeepSeek-AI
Thinking Enabled
AIME202589.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
40
Grok 4.1 Fast
xAI
Thinking Enabled
AIME202589.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
41
OpenAI o3
OpenAI
Thinking Enabled
AIME202588.90
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
42
DeepSeek-V3.1
DeepSeek-AI
Thinking Enabled
AIME202588.40
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
43
Gemini 2.5-Pro
Google Deep Mind
Thinking Enabled
AIME202588.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
44
DeepSeek-R1-0528
DeepSeek-AI
Thinking Enabled
AIME202587.50
FrontierMath - Tier 4—
MATH-50098.00
GSM8K—
Free commercial
45
Claude Sonnet 4.5
Anthropic
Thinking Enabled
AIME202587.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
46
Gemini 2.5 Pro Experimental 03-25
Google Deep Mind
AIME202586.90
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
47
OpenAI o3-mini
OpenAI
Thinking Enabled
AIME202586.50
FrontierMath - Tier 4—
MATH-50095.80
GSM8K—
Proprietary
48
MiniMax M2.5
MiniMaxAI
Thinking Enabled
AIME202586.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
49
Intern-S1
上海人工智能实验室
AIME202586.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
50
Claude Sonnet 4
Anthropic
Deep Thinking ModeTools
AIME202585.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
Sort by:
Showing 50 of 221 modelsView AIME2025 benchmark page