LLM Math Reasoning Benchmark Leaderboard

Name: LLM Math Reasoning Benchmark Leaderboard
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

This page provides the most comprehensive LLM math reasoning benchmark leaderboard. We evaluate models including GPT, Claude, Qwen, and DeepSeek using authoritative math benchmarks such as AIME 2025, FrontierMath-Tier4, MATH-500, and GSM8K.

Updated on 2026-07-18 08:01:51

As of 2026-07, this page covers AIME2025, FrontierMath - Tier 4, MATH-500, GSM8K and related benchmarks for LLM Math Reasoning Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Benchmark

AIME2025 FrontierMath - Tier 4 MATH-500 GSM8K

More Benchmarks

Model Size:All 3B and below 7B 13B 34B 65B 100B and above

Model Type:All Reasoning Models Foundation Models Instruction/Chat Models Coding Models

Source:All Open Source Closed Source

Origin:All China

Model release cutoff:

Top picks

Ranked by AIME2025

Current SOTA

GPT OSS 20B

OpenAI

98.70AIME2025

View model

Best Open-Source

GPT OSS 20B

OpenAI

98.70AIME2025

View model

Best China-Made

Qwen3-235B-A22B-Thinking

阿里巴巴

92.30AIME2025−6.40

View model

LLM Performance Results

Data source: DataLearnerAI

Click any row to open the model page. Tick the checkboxes to compare up to 4 models side by side.

Rank	Model					License
	GPT OSS 20B OpenAI	98.70	—	—	—	Free commercial	Details
	Qwen3-235B-A22B-Thinking 阿里巴巴	92.30	—	—	—	Free commercial	Details
	GLM-4.7-Flash 智谱AI	91.60	—	—	—	Free commercial	Details
4	Qwen3-32B 阿里巴巴	72.90	—	97.20	—	Free commercial	Details
5	Magistral-Small-2506 MistralAI	62.76	—	—	—	Free commercial	Details
6	Qwen3-30B-A3B-2507 阿里巴巴	61.30	—	—	—	Free commercial	Details
7	Qwen3-30B-A3B 阿里巴巴	21.60	—	—	—	Free commercial	Details
8	Qwen2.5-32B 阿里巴巴	—	—	—	95.90	Free commercial	Details
9	Gemma 3 - 27B (IT) Google Deep Mind	—	—	—	95.90	Free commercial	Details
10	Gemma2-27B Google Deep Mind	—	—	—	74.00	Free commercial	Details
11	QwQ-32B 阿里巴巴	—	—	91.00	—	Free commercial	Details
12	QwQ-32B-Preview 阿里巴巴	—	—	90.60	—	Free commercial	Details