LLM Math Reasoning Benchmark Leaderboard

Name: LLM Math Reasoning Benchmark Leaderboard
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

This page provides the most comprehensive LLM math reasoning benchmark leaderboard. We evaluate models including GPT, Claude, Qwen, and DeepSeek using authoritative math benchmarks such as AIME 2025, FrontierMath-Tier4, MATH-500, and GSM8K.

Updated on 2026-07-18 08:01:51

As of 2026-07, this page covers AIME2025, FrontierMath - Tier 4, MATH-500, GSM8K and related benchmarks for LLM Math Reasoning Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Benchmark

AIME2025 FrontierMath - Tier 4 MATH-500 GSM8K

More Benchmarks

Model Size:All 3B and below 7B 13B 34B 65B 100B and above

Model Type:All Reasoning Models Foundation Models Instruction/Chat Models Coding Models

Source:All Open Source Closed Source

Origin:All China

Model release cutoff:

Top picks

Ranked by MATH-500

Current SOTA

Gemini-2.5-Pro-Preview-05-06

Google Deep Mind

98.80MATH-500

View model

Best Open-Source

GLM-4.5

智谱AI

98.20MATH-500−0.60

View model

Best China-Made

GLM-4.5

智谱AI

98.20MATH-500−0.60

View model

LLM Performance Results

Data source: DataLearnerAI

Click any row to open the model page. Tick the checkboxes to compare up to 4 models side by side.

Rank	Model					License
	Gemini-2.5-Pro-Preview-05-06 Google Deep Mind	83.00	2.10	98.80	—	Proprietary	Details
	Gemini 2.5-Pro Google Deep Mind	—	—	98.80	—	Proprietary	Details
	Claude Opus 4 Anthropic	75.50	—	98.20	—	Proprietary	Details
4	GLM-4.5 Thinking Enabled 智谱AI	—	—	98.20	—	Free commercial	Details
5	OpenAI o3 OpenAI	—	—	98.10	—	Proprietary	Details
6	GLM-4.5-Air Thinking Enabled 智谱AI	—	—	98.10	—	Free commercial	Details
7	DeepSeek-R1-0528 Thinking Enabled DeepSeek-AI	87.50	—	98.00	—	Free commercial	Details
8	Qwen3-235B-A22B Thinking Enabled 阿里巴巴	81.50	—	98.00	—	Free commercial	Details
9	OpenAI o3-mini (high) OpenAI	—	—	97.90	—	Proprietary	Details
10	Claude Opus 4.6 Extended Thinking Anthropic	99.79	—	97.60	—	Proprietary	Details
11	Qwen3-8B Thinking Enabled 阿里巴巴	67.30	—	97.40	—	Free commercial	Details
12	Kimi K2 Moonshot AI	54.00	0.01	97.40	—	Free commercial	Details
13	DeepSeek-R1 DeepSeek-AI	70.00	—	97.30	—	Free commercial	Details
14	Qwen3-32B Thinking Enabled 阿里巴巴	72.90	—	97.20	—	Free commercial	Details
15	MiniMax-M1-80k MiniMaxAI	76.90	—	96.80	—	Free commercial	Details
16	Pangu Pro MoE 华为	68.10	—	96.80	—	Free commercial	Details
17	ERNIE-4.5-300B-A47B 百度	35.10	—	96.40	96.60	Free commercial	Details
18	OpenAI o1 OpenAI	—	—	96.40	—	Proprietary	Details
19	Qwen3-235B-A22B 阿里巴巴	24.70	—	96.20	96.40	Free commercial	Details
20	Claude Sonnet 3.7-64K Extended Thinking Anthropic	—	—	96.20	—	Proprietary	Details
21	Kimi k1.5 (Long-CoT) Moonshot AI	—	—	96.20	—	Proprietary	Details
22	Hunyuan-T1 腾讯AI实验室	—	—	96.20	—	Proprietary	Details
23	MiniMax-M1-40k MiniMaxAI	74.60	—	96.00	—	Free commercial	Details
24	OpenAI o3-mini Thinking Enabled OpenAI	86.50	—	95.80	—	Proprietary	Details
25	Llama 4 Behemoth Instruct Facebook AI研究实验室	—	—	95.00	—	Free commercial	Details
26	Kimi k1.5 (Short-CoT) Moonshot AI	—	—	94.60	—	Proprietary	Details
27	DeepSeek-R1-Distill-Llama-70B DeepSeek-AI	—	—	94.50	—	Free commercial	Details
28	DeepSeek-V3-0324 DeepSeek-AI	47.70	—	94.00	96.30	Free commercial	Details
29	Hunyuan-7B Tencent ARC	75.30	—	93.70	—	Free commercial	Details
30	GPT-4.1 OpenAI	36.70	—	92.80	95.90	Proprietary	Details
31	Pangu Embedded 华为	—	—	92.40	95.98	Free commercial	Details
32	DeepSeek-R1-Distill-Qwen-7B DeepSeek-AI	—	—	91.40	—	Free commercial	Details
33	QwQ-32B 阿里巴巴	—	—	91.00	—	Free commercial	Details
34	GPT-4.5 OpenAI	—	—	90.70	—	Proprietary	Details
35	QwQ-32B-Preview 阿里巴巴	—	—	90.60	—	Free commercial	Details
36	Phi-4-instruct (reasoning-trained) Microsoft Azure	—	—	90.40	—	Proprietary	Details
37	OpenAI o1-mini OpenAI	—	—	90.00	—	Proprietary	Details
38	Qwen3-32B 阿里巴巴	20.20	—	88.60	—	Free commercial	Details
39	DeepSeek-V3 DeepSeek-AI	—	—	87.80	—	Free commercial	Details
40	Qwen3-8B 阿里巴巴	20.90	—	87.40	—	Free commercial	Details
41	Claude Sonnet 3.7 Anthropic	54.80	—	82.20	—	Proprietary	Details
42	Claude 3.5 Sonnet New Anthropic	—	—	78.00	—	Proprietary	Details
43	GPT-4o OpenAI	—	—	75.90	—	Proprietary	Details
44	Phi-4-mini-instruct (3.8B) Microsoft Azure	—	—	71.80	88.60	Free commercial	Details
45	Step 3.5 Flash Thinking EnabledTools StepFunAI	99.80	—	—	—	Free commercial	Details
46	Gemini 3.0 Flash Thinking EnabledTools Google Deep Mind	99.70	—	—	—	Proprietary	Details
47	GPT-5 Thinking EnabledTools OpenAI	99.60	—	—	—	Proprietary	Details
48	OpenAI o4 - mini Thinking EnabledTools OpenAI	99.50	—	—	—	Proprietary	Details
49	Gemini 2.5 Deep Think Deep Thinking Mode Google Deep Mind	99.20	—	—	—	Proprietary	Details
50	Kimi K2 Thinking Thinking EnabledTools Moonshot AI	99.10	—	—	—	Free commercial	Details

Gemini-2.5-Pro-Preview-05-06 Google Deep Mind

AIME202583.00

FrontierMath - Tier 42.10

MATH-50098.80

GSM8K—

Proprietary

Gemini 2.5-Pro Google Deep Mind

AIME2025—

FrontierMath - Tier 4—

MATH-50098.80

GSM8K—

Proprietary

Claude Opus 4 Anthropic

AIME202575.50

FrontierMath - Tier 4—

MATH-50098.20

GSM8K—

Proprietary

GLM-4.5 智谱AI

Thinking Enabled

AIME2025—

FrontierMath - Tier 4—

MATH-50098.20

GSM8K—

Free commercial

OpenAI o3 OpenAI

AIME2025—

FrontierMath - Tier 4—

MATH-50098.10

GSM8K—

Proprietary

GLM-4.5-Air 智谱AI

Thinking Enabled

AIME2025—

FrontierMath - Tier 4—

MATH-50098.10

GSM8K—

Free commercial

DeepSeek-R1-0528 DeepSeek-AI

Thinking Enabled

AIME202587.50

FrontierMath - Tier 4—

MATH-50098.00

GSM8K—

Free commercial

Qwen3-235B-A22B 阿里巴巴

Thinking Enabled

AIME202581.50

FrontierMath - Tier 4—

MATH-50098.00

GSM8K—

Free commercial

OpenAI o3-mini (high)OpenAI

AIME2025—

FrontierMath - Tier 4—

MATH-50097.90

GSM8K—

Proprietary

Claude Opus 4.6 Anthropic

Extended Thinking

AIME202599.79

FrontierMath - Tier 4—

MATH-50097.60

GSM8K—

Proprietary

Qwen3-8B 阿里巴巴

Thinking Enabled

AIME202567.30

FrontierMath - Tier 4—

MATH-50097.40

GSM8K—

Free commercial

Kimi K2 Moonshot AI

AIME202554.00

FrontierMath - Tier 40.01

MATH-50097.40

GSM8K—

Free commercial

DeepSeek-R1 DeepSeek-AI

AIME202570.00

FrontierMath - Tier 4—

MATH-50097.30

GSM8K—

Free commercial

Qwen3-32B 阿里巴巴

Thinking Enabled

AIME202572.90

FrontierMath - Tier 4—

MATH-50097.20

GSM8K—

Free commercial

MiniMax-M1-80k MiniMaxAI

AIME202576.90

FrontierMath - Tier 4—

MATH-50096.80

GSM8K—

Free commercial

Pangu Pro MoE 华为

AIME202568.10

FrontierMath - Tier 4—

MATH-50096.80

GSM8K—

Free commercial

ERNIE-4.5-300B-A47B 百度

AIME202535.10

FrontierMath - Tier 4—

MATH-50096.40

GSM8K96.60

Free commercial

OpenAI o1 OpenAI

AIME2025—

FrontierMath - Tier 4—

MATH-50096.40

GSM8K—

Proprietary

Qwen3-235B-A22B 阿里巴巴

AIME202524.70

FrontierMath - Tier 4—

MATH-50096.20

GSM8K96.40

Free commercial

Claude Sonnet 3.7-64K Extended Thinking Anthropic

AIME2025—

FrontierMath - Tier 4—

MATH-50096.20

GSM8K—

Proprietary

Kimi k1.5 (Long-CoT)Moonshot AI

AIME2025—

FrontierMath - Tier 4—

MATH-50096.20

GSM8K—

Proprietary

Hunyuan-T1 腾讯AI实验室

AIME2025—

FrontierMath - Tier 4—

MATH-50096.20

GSM8K—

Proprietary

MiniMax-M1-40k MiniMaxAI

AIME202574.60

FrontierMath - Tier 4—

MATH-50096.00

GSM8K—

Free commercial

OpenAI o3-mini OpenAI

Thinking Enabled

AIME202586.50

FrontierMath - Tier 4—

MATH-50095.80

GSM8K—

Proprietary

Llama 4 Behemoth Instruct Facebook AI研究实验室

AIME2025—

FrontierMath - Tier 4—

MATH-50095.00

GSM8K—

Free commercial

Kimi k1.5 (Short-CoT)Moonshot AI

AIME2025—

FrontierMath - Tier 4—

MATH-50094.60

GSM8K—

Proprietary

DeepSeek-R1-Distill-Llama-70B DeepSeek-AI

AIME2025—

FrontierMath - Tier 4—

MATH-50094.50

GSM8K—

Free commercial

DeepSeek-V3-0324 DeepSeek-AI

AIME202547.70

FrontierMath - Tier 4—

MATH-50094.00

GSM8K96.30

Free commercial

Hunyuan-7B Tencent ARC

AIME202575.30

FrontierMath - Tier 4—

MATH-50093.70

GSM8K—

Free commercial

GPT-4.1 OpenAI

AIME202536.70

FrontierMath - Tier 4—

MATH-50092.80

GSM8K95.90

Proprietary

Pangu Embedded 华为

AIME2025—

FrontierMath - Tier 4—

MATH-50092.40

GSM8K95.98

Free commercial

DeepSeek-R1-Distill-Qwen-7B DeepSeek-AI

AIME2025—

FrontierMath - Tier 4—

MATH-50091.40

GSM8K—

Free commercial

QwQ-32B 阿里巴巴

AIME2025—

FrontierMath - Tier 4—

MATH-50091.00

GSM8K—

Free commercial

GPT-4.5 OpenAI

AIME2025—

FrontierMath - Tier 4—

MATH-50090.70

GSM8K—

Proprietary

QwQ-32B-Preview 阿里巴巴

AIME2025—

FrontierMath - Tier 4—

MATH-50090.60

GSM8K—

Free commercial

Phi-4-instruct (reasoning-trained)Microsoft Azure

AIME2025—

FrontierMath - Tier 4—

MATH-50090.40

GSM8K—

Proprietary

OpenAI o1-mini OpenAI

AIME2025—

FrontierMath - Tier 4—

MATH-50090.00

GSM8K—

Proprietary

Qwen3-32B 阿里巴巴

AIME202520.20

FrontierMath - Tier 4—

MATH-50088.60

GSM8K—

Free commercial

DeepSeek-V3 DeepSeek-AI

AIME2025—

FrontierMath - Tier 4—

MATH-50087.80

GSM8K—

Free commercial

Qwen3-8B 阿里巴巴

AIME202520.90

FrontierMath - Tier 4—

MATH-50087.40

GSM8K—

Free commercial

Claude Sonnet 3.7 Anthropic

AIME202554.80

FrontierMath - Tier 4—

MATH-50082.20

GSM8K—

Proprietary

Claude 3.5 Sonnet New Anthropic

AIME2025—

FrontierMath - Tier 4—

MATH-50078.00

GSM8K—

Proprietary

GPT-4o OpenAI

AIME2025—

FrontierMath - Tier 4—

MATH-50075.90

GSM8K—

Proprietary

Phi-4-mini-instruct (3.8B)Microsoft Azure

AIME2025—

FrontierMath - Tier 4—

MATH-50071.80

GSM8K88.60

Free commercial

Step 3.5 Flash StepFunAI

Thinking EnabledTools

AIME202599.80

FrontierMath - Tier 4—

MATH-500—

GSM8K—

Free commercial

Gemini 3.0 Flash Google Deep Mind

Thinking EnabledTools

AIME202599.70

FrontierMath - Tier 4—

MATH-500—

GSM8K—

Proprietary

GPT-5 OpenAI

Thinking EnabledTools

AIME202599.60

FrontierMath - Tier 4—

MATH-500—

GSM8K—

Proprietary

OpenAI o4 - mini OpenAI

Thinking EnabledTools

AIME202599.50

FrontierMath - Tier 4—

MATH-500—

GSM8K—

Proprietary

Gemini 2.5 Deep Think Google Deep Mind

Deep Thinking Mode

AIME202599.20

FrontierMath - Tier 4—

MATH-500—

GSM8K—

Proprietary

Kimi K2 Thinking Moonshot AI

Thinking EnabledTools

AIME202599.10

FrontierMath - Tier 4—

MATH-500—

GSM8K—

Free commercial

Sort by:

Showing 50 of 222 modelsView MATH-500 benchmark page