DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Math Reasoning Benchmark Leaderboard

This page provides the most comprehensive LLM math reasoning benchmark leaderboard. We evaluate models including GPT, Claude, Qwen, and DeepSeek using authoritative math benchmarks such as AIME 2025, FrontierMath-Tier4, MATH-500, and GSM8K.

Updated on 2026-05-02 07:14:49

As of 2026-05, this page covers AIME2025, FrontierMath - Tier 4, MATH-500, GSM8K and related benchmarks for LLM Math Reasoning Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Benchmark
AIME2025FrontierMath - Tier 4MATH-500GSM8K
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff:

LLM Performance Results

Data source: DataLearnerAI
RankModelLicense
OpenAI
GPT-5.5 Pro
Thinking Level · Extra HighTools
OpenAI
—39.60——Proprietary
OpenAI
GPT-5.5 Pro
Thinking Level · High
OpenAI
—39.60——Proprietary
OpenAI
GPT-5.5 Pro
Thinking Level · Extra High
OpenAI
—39.60——Proprietary
4
OpenAI
GPT-5.4 Pro
Thinking Level · High
OpenAI
—38.00——Proprietary
5
OpenAI
GPT-5.4 Pro
Standard ModeToolsInternet
OpenAI
—37.50——Proprietary
6
OpenAI
GPT-5.4 Pro
Thinking Level · Extra High
OpenAI
—37.50——Proprietary
7
OpenAI
GPT-5.5
Thinking Level · Extra High
OpenAI
—35.40——Proprietary
8
OpenAI
GPT-5.5
Thinking Level · HighTools
OpenAI
—35.40——Proprietary
9
OpenAI
GPT-5.2 Pro
Standard ModeToolsInternet
OpenAI
—31.30——Proprietary
10
OpenAI
GPT-5.2 Pro
Thinking Enabled
OpenAI
—31.30——Proprietary
11
OpenAI
GPT-5.4
Thinking Level · Extra High
OpenAI
—27.10——Proprietary
12
Anthropic
Opus 4.7
Thinking Level · Extra High
Anthropic
—22.90——Proprietary
13
Anthropic
Claude Opus 4.6
Thinking Level · High
Anthropic
—22.90——Proprietary
14
Anthropic
Claude Opus 4.6
Thinking Level · Medium
Anthropic
—20.80——Proprietary
15
Anthropic
Claude Opus 4.6
Thinking Enabled
Anthropic
—20.80——Proprietary
16
Google Deep Mind
Gemini 3.0 Pro (Preview 11-2025)
Thinking Enabled
Google Deep Mind
95.0018.80——Proprietary
17
OpenAI
GPT-5.2
Thinking Level · Extra High
OpenAI
—18.80——Proprietary
18
OpenAI
GPT-5.2
Thinking Level · High
OpenAI
—18.80——Proprietary
19
Google Deep Mind
Gemini 3.0 Pro (Preview 11-2025)
Google Deep Mind
—18.80——Proprietary
20
Google Deep Mind
Gemini 3.1 Pro Preview
Thinking Level · High
Google Deep Mind
—16.70——Proprietary
21
Google Deep Mind
Gemini 3.1 Pro Preview
Google Deep Mind
—16.70——Proprietary
22
OpenAI
GPT-5.2
Thinking Level · Medium
OpenAI
—16.70——Proprietary
23
OpenAI
GPT-5-Pro
Thinking Enabled
OpenAI
96.7014.60——Proprietary
24
Facebook AI研究实验室
Muse Spark
Facebook AI研究实验室
—14.60——Proprietary
25
Facebook AI研究实验室
Muse Spark
Thinking Enabled
Facebook AI研究实验室
—14.60——Proprietary
26
Anthropic
Claude Opus 4.6
Thinking Level · High
Anthropic
—14.60——Proprietary
27
OpenAI
GPT-5.2
Thinking Level · Extra HighTools
OpenAI
—14.60——Proprietary
28
OpenAI
GPT-5-Pro
OpenAI
—14.60——Proprietary
29
OpenAI
GPT-5.1
Thinking Level · High
OpenAI
—12.50——Proprietary
30
OpenAI
GPT-5.1
Thinking Level · HighTools
OpenAI
—12.50——Proprietary
31
OpenAI
GPT-5
Thinking Level · High
OpenAI
—12.50——Proprietary
32
Google Deep Mind
Gemini 2.5 Deep Think
Google Deep Mind
—10.40——Proprietary
33
Google Deep Mind
Gemini 2.5 Pro Deep Think
Deep Thinking Mode
Google Deep Mind
—10.40——Proprietary
34
Anthropic
Claude Sonnet 4.6
Thinking Level · Low
Anthropic
—8.30——Proprietary
35
OpenAI
OpenAI o4 - mini
Thinking Level · High
OpenAI
—6.30——Proprietary
36
OpenAI
GPT-5
Thinking Level · Medium
OpenAI
—6.30——Proprietary
37
OpenAI
GPT-5-mini
Thinking Level · High
OpenAI
—6.30——Proprietary
38
OpenAI
GPT-5.2
Thinking Level · Low
OpenAI
—6.30——Proprietary
39
OpenAI
GPT-5.4 nano
Thinking Level · High
OpenAI
—6.30——Proprietary
40
Anthropic
Opus 4.1
Extended Thinking
Anthropic
78.004.20——Proprietary
41
OpenAI
OpenAI o3-mini
Thinking Level · High
OpenAI
—4.20——Proprietary
42
OpenAI
OpenAI o3-mini (high)
Thinking Level · High
OpenAI
—4.20——Proprietary
43
Google Deep Mind
Gemini 2.5 Pro Experimental 03-25
Google Deep Mind
—4.20——Proprietary
44
Google Deep Mind
Gemini 2.5 Flash
Google Deep Mind
—4.20——Proprietary
45
Anthropic
Claude Opus 4
Thinking Enabled
Anthropic
—4.20——Proprietary
46
Anthropic
Claude Opus 4
Thinking Level · Medium
Anthropic
—4.20——Proprietary
47
OpenAI
GPT-5-mini
Thinking Level · Medium
OpenAI
—4.20——Proprietary
48
Anthropic
Opus 4.1
Thinking Level · Medium
Anthropic
—4.20——Proprietary
49
Moonshot AI
Kimi K2.5
Moonshot AI
—4.20——Free commercial
50
Anthropic
Opus 4.5
Thinking Level · Medium
Anthropic
—4.20——Proprietary
GPT-5.5 Pro
OpenAI
Thinking Level · Extra HighTools
AIME2025—
FrontierMath - Tier 439.60
MATH-500—
GSM8K—
Proprietary
GPT-5.5 Pro
OpenAI
Thinking Level · High
AIME2025—
FrontierMath - Tier 439.60
MATH-500—
GSM8K—
Proprietary
GPT-5.5 Pro
OpenAI
Thinking Level · Extra High
AIME2025—
FrontierMath - Tier 439.60
MATH-500—
GSM8K—
Proprietary
4
GPT-5.4 Pro
OpenAI
Thinking Level · High
AIME2025—
FrontierMath - Tier 438.00
MATH-500—
GSM8K—
Proprietary
5
GPT-5.4 Pro
OpenAI
Standard ModeToolsInternet
AIME2025—
FrontierMath - Tier 437.50
MATH-500—
GSM8K—
Proprietary
6
GPT-5.4 Pro
OpenAI
Thinking Level · Extra High
AIME2025—
FrontierMath - Tier 437.50
MATH-500—
GSM8K—
Proprietary
7
GPT-5.5
OpenAI
Thinking Level · Extra High
AIME2025—
FrontierMath - Tier 435.40
MATH-500—
GSM8K—
Proprietary
8
GPT-5.5
OpenAI
Thinking Level · HighTools
AIME2025—
FrontierMath - Tier 435.40
MATH-500—
GSM8K—
Proprietary
9
GPT-5.2 Pro
OpenAI
Standard ModeToolsInternet
AIME2025—
FrontierMath - Tier 431.30
MATH-500—
GSM8K—
Proprietary
10
GPT-5.2 Pro
OpenAI
Thinking Enabled
AIME2025—
FrontierMath - Tier 431.30
MATH-500—
GSM8K—
Proprietary
11
GPT-5.4
OpenAI
Thinking Level · Extra High
AIME2025—
FrontierMath - Tier 427.10
MATH-500—
GSM8K—
Proprietary
12
Opus 4.7
Anthropic
Thinking Level · Extra High
AIME2025—
FrontierMath - Tier 422.90
MATH-500—
GSM8K—
Proprietary
13
Claude Opus 4.6
Anthropic
Thinking Level · High
AIME2025—
FrontierMath - Tier 422.90
MATH-500—
GSM8K—
Proprietary
14
Claude Opus 4.6
Anthropic
Thinking Level · Medium
AIME2025—
FrontierMath - Tier 420.80
MATH-500—
GSM8K—
Proprietary
15
Claude Opus 4.6
Anthropic
Thinking Enabled
AIME2025—
FrontierMath - Tier 420.80
MATH-500—
GSM8K—
Proprietary
16
Gemini 3.0 Pro (Preview 11-2025)
Google Deep Mind
Thinking Enabled
AIME202595.00
FrontierMath - Tier 418.80
MATH-500—
GSM8K—
Proprietary
17
GPT-5.2
OpenAI
Thinking Level · Extra High
AIME2025—
FrontierMath - Tier 418.80
MATH-500—
GSM8K—
Proprietary
18
GPT-5.2
OpenAI
Thinking Level · High
AIME2025—
FrontierMath - Tier 418.80
MATH-500—
GSM8K—
Proprietary
19
Gemini 3.0 Pro (Preview 11-2025)
Google Deep Mind
AIME2025—
FrontierMath - Tier 418.80
MATH-500—
GSM8K—
Proprietary
20
Gemini 3.1 Pro Preview
Google Deep Mind
Thinking Level · High
AIME2025—
FrontierMath - Tier 416.70
MATH-500—
GSM8K—
Proprietary
21
Gemini 3.1 Pro Preview
Google Deep Mind
AIME2025—
FrontierMath - Tier 416.70
MATH-500—
GSM8K—
Proprietary
22
GPT-5.2
OpenAI
Thinking Level · Medium
AIME2025—
FrontierMath - Tier 416.70
MATH-500—
GSM8K—
Proprietary
23
GPT-5-Pro
OpenAI
Thinking Enabled
AIME202596.70
FrontierMath - Tier 414.60
MATH-500—
GSM8K—
Proprietary
24
Muse Spark
Facebook AI研究实验室
AIME2025—
FrontierMath - Tier 414.60
MATH-500—
GSM8K—
Proprietary
25
Muse Spark
Facebook AI研究实验室
Thinking Enabled
AIME2025—
FrontierMath - Tier 414.60
MATH-500—
GSM8K—
Proprietary
26
Claude Opus 4.6
Anthropic
Thinking Level · High
AIME2025—
FrontierMath - Tier 414.60
MATH-500—
GSM8K—
Proprietary
27
GPT-5.2
OpenAI
Thinking Level · Extra HighTools
AIME2025—
FrontierMath - Tier 414.60
MATH-500—
GSM8K—
Proprietary
28
GPT-5-Pro
OpenAI
AIME2025—
FrontierMath - Tier 414.60
MATH-500—
GSM8K—
Proprietary
29
GPT-5.1
OpenAI
Thinking Level · High
AIME2025—
FrontierMath - Tier 412.50
MATH-500—
GSM8K—
Proprietary
30
GPT-5.1
OpenAI
Thinking Level · HighTools
AIME2025—
FrontierMath - Tier 412.50
MATH-500—
GSM8K—
Proprietary
31
GPT-5
OpenAI
Thinking Level · High
AIME2025—
FrontierMath - Tier 412.50
MATH-500—
GSM8K—
Proprietary
32
Gemini 2.5 Deep Think
Google Deep Mind
AIME2025—
FrontierMath - Tier 410.40
MATH-500—
GSM8K—
Proprietary
33
Gemini 2.5 Pro Deep Think
Google Deep Mind
Deep Thinking Mode
AIME2025—
FrontierMath - Tier 410.40
MATH-500—
GSM8K—
Proprietary
34
Claude Sonnet 4.6
Anthropic
Thinking Level · Low
AIME2025—
FrontierMath - Tier 48.30
MATH-500—
GSM8K—
Proprietary
35
OpenAI o4 - mini
OpenAI
Thinking Level · High
AIME2025—
FrontierMath - Tier 46.30
MATH-500—
GSM8K—
Proprietary
36
GPT-5
OpenAI
Thinking Level · Medium
AIME2025—
FrontierMath - Tier 46.30
MATH-500—
GSM8K—
Proprietary
37
GPT-5-mini
OpenAI
Thinking Level · High
AIME2025—
FrontierMath - Tier 46.30
MATH-500—
GSM8K—
Proprietary
38
GPT-5.2
OpenAI
Thinking Level · Low
AIME2025—
FrontierMath - Tier 46.30
MATH-500—
GSM8K—
Proprietary
39
GPT-5.4 nano
OpenAI
Thinking Level · High
AIME2025—
FrontierMath - Tier 46.30
MATH-500—
GSM8K—
Proprietary
40
Opus 4.1
Anthropic
Extended Thinking
AIME202578.00
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Proprietary
41
OpenAI o3-mini
OpenAI
Thinking Level · High
AIME2025—
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Proprietary
42
OpenAI o3-mini (high)
OpenAI
Thinking Level · High
AIME2025—
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Proprietary
43
Gemini 2.5 Pro Experimental 03-25
Google Deep Mind
AIME2025—
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Proprietary
44
Gemini 2.5 Flash
Google Deep Mind
AIME2025—
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Proprietary
45
Claude Opus 4
Anthropic
Thinking Enabled
AIME2025—
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Proprietary
46
Claude Opus 4
Anthropic
Thinking Level · Medium
AIME2025—
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Proprietary
47
GPT-5-mini
OpenAI
Thinking Level · Medium
AIME2025—
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Proprietary
48
Opus 4.1
Anthropic
Thinking Level · Medium
AIME2025—
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Proprietary
49
Kimi K2.5
Moonshot AI
AIME2025—
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Free commercial
50
Opus 4.5
Anthropic
Thinking Level · Medium
AIME2025—
FrontierMath - Tier 44.20
MATH-500—
GSM8K—
Proprietary
Sort by:
Showing 50 of 221 modelsView FrontierMath - Tier 4 benchmark page