DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Math Reasoning Benchmark Leaderboard

This page provides the most comprehensive LLM math reasoning benchmark leaderboard. We evaluate models including GPT, Claude, Qwen, and DeepSeek using authoritative math benchmarks such as AIME 2025, FrontierMath-Tier4, MATH-500, and GSM8K.

Updated on 2026-05-02 07:14:49

As of 2026-05, this page covers AIME2025, FrontierMath - Tier 4, MATH-500, GSM8K and related benchmarks for LLM Math Reasoning Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Benchmark
AIME2025FrontierMath - Tier 4MATH-500GSM8K
More Benchmarks
Model Size:All3B and below7B13B34B

LLM Performance Results

Data source: DataLearnerAI
RankModelLicense
OpenAI
GPT OSS 20B
OpenAI
98.70———Free commercial
阿里巴巴
Qwen3-235B-A22B-Thinking
阿里巴巴
92.30———Free commercial
智谱AI
GLM-4.7-Flash
智谱AI
91.60———Free commercial
4
阿里巴巴
Qwen3-32B
阿里巴巴
72.90—97.20—Free commercial
5
MistralAI
Magistral-Small-2506
MistralAI
62.76———Free commercial
6
阿里巴巴
Qwen3-30B-A3B-2507
阿里巴巴
61.30———Free commercial
7
阿里巴巴
Qwen3-30B-A3B
阿里巴巴
21.60———Free commercial
8
阿里巴巴
Qwen2.5-32B
阿里巴巴
———95.90Free commercial
9
Google Deep Mind
Gemma 3 - 27B (IT)
Google Deep Mind
———95.90Free commercial
10
Google Deep Mind
Gemma2-27B
Google Deep Mind
———74.00Free commercial
11
阿里巴巴
QwQ-32B
阿里巴巴
——91.00—Free commercial
12
阿里巴巴
QwQ-32B-Preview
阿里巴巴
——90.60—Free commercial
GPT OSS 20B
OpenAI
AIME202598.70
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
Qwen3-235B-A22B-Thinking
阿里巴巴
AIME202592.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
GLM-4.7-Flash
智谱AI
AIME202591.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
4
Qwen3-32B
阿里巴巴
AIME202572.90
FrontierMath - Tier 4—
MATH-50097.20
GSM8K—
Free commercial
5
Magistral-Small-2506
MistralAI
AIME202562.76
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
6
Qwen3-30B-A3B-2507
阿里巴巴
AIME202561.30
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
7
Qwen3-30B-A3B
阿里巴巴
AIME202521.60
FrontierMath - Tier 4—
MATH-500—
GSM8K—
Free commercial
8
Qwen2.5-32B
阿里巴巴
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K95.90
Free commercial
9
Gemma 3 - 27B (IT)
Google Deep Mind
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K95.90
Free commercial
10
Gemma2-27B
Google Deep Mind
AIME2025—
FrontierMath - Tier 4—
MATH-500—
GSM8K74.00
Free commercial
11
QwQ-32B
阿里巴巴
AIME2025—
FrontierMath - Tier 4—
MATH-50091.00
GSM8K—
Free commercial
12
QwQ-32B-Preview
阿里巴巴
AIME2025—
FrontierMath - Tier 4—
MATH-50090.60
GSM8K—
Free commercial
Sort by:
65B
100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff: