DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Math Reasoning Benchmark Leaderboard

This page provides the most comprehensive LLM math reasoning benchmark leaderboard. We evaluate models including GPT, Claude, Qwen, and DeepSeek using authoritative math benchmarks such as AIME 2025, FrontierMath-Tier4, MATH-500, and GSM8K.

Updated on 2026-05-02 07:14:49

As of 2026-05, this page covers AIME2025, FrontierMath - Tier 4, MATH-500, GSM8K and related benchmarks for LLM Math Reasoning Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Benchmark
AIME2025FrontierMath - Tier 4MATH-500GSM8K
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff:

LLM Performance Results

Data source: DataLearnerAI
RankModelLicense
百度
ERNIE-4.5-300B-A47B
百度
35.10—96.4096.60Free commercial
阿里巴巴
Qwen3-235B-A22B
阿里巴巴
24.70—96.2096.40Free commercial
DeepSeek-AI
DeepSeek-V3-0324
DeepSeek-AI
47.70—94.0096.30Free commercial
4
华为
Pangu Embedded
华为
——92.4095.98Free commercial
5
OpenAI
GPT-4.1
OpenAI
36.70—92.8095.90Proprietary
6
阿里巴巴
Qwen2.5-32B
阿里巴巴
———95.90Free commercial
7
Google Deep Mind
Gemma 3 - 27B (IT)
Google Deep Mind
———95.90Free commercial
8
Anthropic
Claude3-Opus
Anthropic
———95.00Proprietary
9
阿里巴巴
Qwen2.5-Max
阿里巴巴
———94.50Proprietary
10
腾讯AI实验室
Hunyuan-A13B-Instruct
腾讯AI实验室
76.80——91.83Free commercial
11
阿里巴巴
Qwen2.5-72B
阿里巴巴
———91.50Free commercial
12
OpenAI
GPT-4o mini
OpenAI
———91.30Proprietary
13
阿里巴巴
Qwen3-Next
阿里巴巴
69.50——90.30Free commercial
14
Microsoft Azure
Phi-4-mini-instruct (3.8B)
Microsoft Azure
——71.8088.60Free commercial
15
阿里巴巴
Qwen2.5-7B
阿里巴巴
———85.40Free commercial
16
Facebook AI研究实验室
Llama3.1-8B-Instruct
Facebook AI研究实验室
———82.40Free commercial
17
阿里巴巴
Qwen2.5-3B
阿里巴巴
———79.10Free commercial
18
Moonshot AI
Moonlight-16B-A3B-Instruct
Moonshot AI
———77.40Free commercial
19
Google Deep Mind
Gemma2-27B
Google Deep Mind
———74.00Free commercial
20
Google Research
Gemma 2 - 9B
Google Research
———70.70Free commercial
21
Facebook AI研究实验室
Llama3.1-8B
Facebook AI研究实验室
———55.30Free commercial
22
MistralAI
Mistral-7B-Instruct-v0.3
MistralAI
———36.20Free commercial
23
Facebook AI研究实验室
Llama-3.2-3B
Facebook AI研究实验室
———34.00Free commercial
24
StepFunAI
Step 3.5 Flash
Thinking EnabledTools
StepFunAI
99.80———Free commercial
25
Anthropic
Claude Opus 4.6
Extended Thinking
Anthropic
99.79—97.60—Proprietary
26
Google Deep Mind
Gemini 3.0 Flash
Thinking EnabledTools
Google Deep Mind
99.70———Proprietary
27
OpenAI
GPT-5
Thinking EnabledTools
OpenAI
99.60———Proprietary
28
OpenAI
OpenAI o4 - mini
Thinking EnabledTools
OpenAI
99.50———Proprietary
29
Google Deep Mind
Gemini 2.5 Deep Think
Deep Thinking Mode
Google Deep Mind
99.20———Proprietary
30
Moonshot AI
Kimi K2 Thinking
Thinking EnabledTools
Moonshot AI
99.10———Free commercial
31
xAI
Grok 4
Thinking EnabledTools
xAI
98.80———Proprietary
32
OpenAI
GPT OSS 20B
Thinking EnabledTools
OpenAI
98.70———Free commercial
33
智谱AI
GLM-4.6
Thinking Enabled
智谱AI
98.60———Free commercial
34
智谱AI
GLM-4.6
Thinking EnabledTools
智谱AI
98.60———Free commercial
35
OpenAI
GPT OSS 120B
Thinking EnabledTools
OpenAI
97.90———Free commercial
36
StepFunAI
Step 3.5 Flash
Thinking Enabled
StepFunAI
97.30———Free commercial
37
OpenAI
GPT-5-Pro
Thinking Enabled
OpenAI
96.7014.60——Proprietary
38
Anthropic
Haiku 4.5
Thinking EnabledTools
Anthropic
96.30———Proprietary
39
Moonshot AI
Kimi K2.5
Thinking Enabled
Moonshot AI
96.10———Free commercial
40
DeepSeek-AI
DeepSeek V3.2 Speciale
Thinking Enabled
DeepSeek-AI
96.00———Free commercial
41
智谱AI
GLM-4.7
Thinking Enabled
智谱AI
95.70———Free commercial
42
Google Deep Mind
Gemini 3.0 Flash
Thinking Enabled
Google Deep Mind
95.20———Proprietary
43
Google Deep Mind
Gemini 3.0 Pro (Preview 11-2025)
Thinking Enabled
Google Deep Mind
95.0018.80——Proprietary
44
OpenAI
GPT-5
Thinking Enabled
OpenAI
94.60———Proprietary
45
Moonshot AI
Kimi K2 Thinking
Thinking Enabled
Moonshot AI
94.50———Free commercial
46
OpenAI
GPT-5.1
Thinking Level · High
OpenAI
94.00———Proprietary
47
OpenAI
GPT-5.1
Thinking Level · High
OpenAI
94.00———Proprietary
48
DeepSeek-AI
DeepSeek V3.2
Thinking Enabled
DeepSeek-AI
93.102.10——Free commercial
49
OpenAI
o3-pro
OpenAI
93.00———Proprietary
50
OpenAI
OpenAI o4 - mini
Thinking Enabled
OpenAI
92.70———Proprietary
ERNIE-4.5-300B-A47B
百度
AIME202535.10
FrontierMath - Tier 4—
MATH-50096.40
GSM8K96.60
Free commercial
Qwen3-235B-A22B
阿里巴巴
AIME202524.70
FrontierMath - Tier 4—
MATH-50096.20
GSM8K96.40
Free commercial
DeepSeek-V3-0324
DeepSeek-AI
AIME202547.70
FrontierMath - Tier 4—
MATH-50094.00
GSM8K96.30
Free commercial
4
Pangu Embedded
华为
AIME2025—
FrontierMath - Tier 4—
MATH-50092.40
GSM8K95.98
Free commercial
5
GPT-4.1
OpenAI
AIME202536.70
FrontierMath - Tier 4—
MATH-50092.80
GSM8K95.90
Proprietary
6
Qwen2.5-32B
阿里巴巴
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K95.90
Free commercial
7
Gemma 3 - 27B (IT)
Google Deep Mind
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K95.90
Free commercial
8
Claude3-Opus
Anthropic
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K95.00
Proprietary
9
Qwen2.5-Max
阿里巴巴
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K94.50
Proprietary
10
Hunyuan-A13B-Instruct
腾讯AI实验室
AIME202576.80
FrontierMath - Tier 4—
MATH-500—
GSM8K91.83
Free commercial
11
Qwen2.5-72B
阿里巴巴
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K91.50
Free commercial
12
GPT-4o mini
OpenAI
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K91.30
Proprietary
13
Qwen3-Next
阿里巴巴
AIME202569.50
FrontierMath - Tier 4—
MATH-500—
GSM8K90.30
Free commercial
14
Phi-4-mini-instruct (3.8B)
Microsoft Azure
AIME2025—
FrontierMath - Tier 4—
MATH-50071.80
GSM8K88.60
Free commercial
15
Qwen2.5-7B
阿里巴巴
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K85.40
Free commercial
16
Llama3.1-8B-Instruct
Facebook AI研究实验室
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K82.40
Free commercial
17
Qwen2.5-3B
阿里巴巴
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K79.10
Free commercial
18
Moonlight-16B-A3B-Instruct
Moonshot AI
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K77.40
Free commercial
19
Gemma2-27B
Google Deep Mind
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K74.00
Free commercial
20
Gemma 2 - 9B
Google Research
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K70.70
Free commercial
21
Llama3.1-8B
Facebook AI研究实验室
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K55.30
Free commercial
22
Mistral-7B-Instruct-v0.3
MistralAI
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K36.20
Free commercial
23
Llama-3.2-3B
Facebook AI研究实验室
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K34.00
Free commercial
24
Step 3.5 Flash
StepFunAI
Thinking EnabledTools
AIME202599.80
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
25
Claude Opus 4.6
Anthropic
Extended Thinking
AIME202599.79
FrontierMath - Tier 4—
MATH-50097.60
GSM8K—
Proprietary
26
Gemini 3.0 Flash
Google Deep Mind
Thinking EnabledTools
AIME202599.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
27
GPT-5
OpenAI
Thinking EnabledTools
AIME202599.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
28
OpenAI o4 - mini
OpenAI
Thinking EnabledTools
AIME202599.50
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
29
Gemini 2.5 Deep Think
Google Deep Mind
Deep Thinking Mode
AIME202599.20
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
30
Kimi K2 Thinking
Moonshot AI
Thinking EnabledTools
AIME202599.10
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
31
Grok 4
xAI
Thinking EnabledTools
AIME202598.80
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
32
GPT OSS 20B
OpenAI
Thinking EnabledTools
AIME202598.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
33
GLM-4.6
智谱AI
Thinking Enabled
AIME202598.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
34
GLM-4.6
智谱AI
Thinking EnabledTools
AIME202598.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
35
GPT OSS 120B
OpenAI
Thinking EnabledTools
AIME202597.90
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
36
Step 3.5 Flash
StepFunAI
Thinking Enabled
AIME202597.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
37
GPT-5-Pro
OpenAI
Thinking Enabled
AIME202596.70
FrontierMath - Tier 414.60
MATH-500—
GSM8K—
Proprietary
38
Haiku 4.5
Anthropic
Thinking EnabledTools
AIME202596.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
39
Kimi K2.5
Moonshot AI
Thinking Enabled
AIME202596.10
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
40
DeepSeek V3.2 Speciale
DeepSeek-AI
Thinking Enabled
AIME202596.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
41
GLM-4.7
智谱AI
Thinking Enabled
AIME202595.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
42
Gemini 3.0 Flash
Google Deep Mind
Thinking Enabled
AIME202595.20
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
43
Gemini 3.0 Pro (Preview 11-2025)
Google Deep Mind
Thinking Enabled
AIME202595.00
FrontierMath - Tier 418.80
MATH-500—
GSM8K—
Proprietary
44
GPT-5
OpenAI
Thinking Enabled
AIME202594.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
45
Kimi K2 Thinking
Moonshot AI
Thinking Enabled
AIME202594.50
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
46
GPT-5.1
OpenAI
Thinking Level · High
AIME202594.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
47
GPT-5.1
OpenAI
Thinking Level · High
AIME202594.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
48
DeepSeek V3.2
DeepSeek-AI
Thinking Enabled
AIME202593.10
FrontierMath - Tier 42.10
MATH-500—
GSM8K—
Free commercial
49
o3-pro
OpenAI
AIME202593.00
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
50
OpenAI o4 - mini
OpenAI
Thinking Enabled
AIME202592.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Proprietary
Sort by:
Showing 50 of 221 modelsView GSM8K benchmark page