DataLearner logoDataLearnerAI
AI Tech Blogs
Leaderboards
Benchmarks
Models
Resources
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款
Back to Main Leaderboard

大模型数学推理能力评测排行榜

本页面提供最新、最全面的大模型数学推理能力评测排行榜。我们通过 GSM8K、MATH、AIME 2025 等多个权威数学基准数据集,对包括 OpenAI 的 GPT-4o、Anthropic 的 Claude、阿里巴巴的 Qwen、DeepSeek-R1 等模型进行评测。

Updated on: 2025-07-20 20:56:42

Benchmark switcher

Pick the leaderboard to sync both chart and table

AIME2025AIME 2024MATH-500GSM8K

Filters

Active
All3B and below7B13B34B65B100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI

LLM Performance Results

Data source: DataLearnerAI

AIME2025
RankModelAIME2025AIME 2024MATH-500GSM8KParams (B)License
1StepFun Flash 3.599.800.000.000.001960BFree commercial
2OpenAI o4 - mini99.5098.700.000.00—不开源
3GLM-4.698.600.000.000.003550BFree commercial
4Kimi K2.596.100.000.000.0010000BFree commercial
5GLM-4.795.700.000.000.003580BFree commercial
6DeepSeek V3.293.100.000.000.006710BFree commercial
7o3-pro93.0093.000.000.00—不开源
8Qwen3-235B-A22B-Thinking-250792.300.000.000.002350BFree commercial
9DeepSeek-V3.1 Terminus90.000.000.000.006710BFree commercial
10DeepSeek V3.2-Exp89.300.000.000.006710BFree commercial
11DeepSeek-V3.188.4093.100.000.006710BFree commercial
12DeepSeek-R1-052887.5091.4098.000.006710BFree commercial
13Intern-S186.000.000.000.002410BFree commercial
14Gemini-2.5-Pro-Preview-05-0683.0092.0098.800.00—不开源
15Step382.900.000.000.003210BFree commercial
16Qwen3-235B-A22B81.5085.7098.0096.402350BFree commercial
17M2.181.000.000.000.002300BFree commercial
18MiniMax M278.000.000.000.002300BFree commercial
19Grok 377.1084.200.000.00—不开源
20MiniMax-M1-80k76.9086.0096.800.004560BFree commercial
21Claude Opus 475.5076.0098.200.00—不开源
22Kimi K2 090575.200.000.000.0010000BFree commercial
23MiniMax-M1-40k74.6083.3096.000.004560BFree commercial
24Gemini 2.5 Flash72.0088.000.000.00—不开源
25Qwen3-235B-A22B-250770.300.000.000.002350BFree commercial
26DeepSeek-R170.0079.8097.300.006710BFree commercial
27Magistral-Medium-250664.9573.590.000.00—不开源
28Gemini 2.5 Flash-Lite63.100.000.000.00—不开源
29Claude Sonnet 3.754.8023.3082.200.00—不开源
30Kimi K254.0069.6097.400.0010000BFree commercial
31DeepSeek-V3-032447.7059.4094.0096.306710BFree commercial
32GPT-4.136.7048.1092.8095.90—不开源
33ERNIE-4.5-VL-424B-A47B-Base35.100.000.000.004240BFree commercial
34ERNIE-4.5-300B-A47B35.1054.8096.4096.603000BFree commercial
35Gemini 2.0 Flash Experimental29.700.000.000.00—不开源
36Kimi K2 Thinking100.000.000.000.0010400BFree commercial
37Kimi k1.5 (Short-CoT)0.000.0094.600.00—不开源
38Qwen2.5-Max0.000.000.0094.50—不开源
39Llama3.1-405B Instruct0.000.000.000.004050BFree commercial
40Amazon Nova Pro0.000.000.000.00—不开源
41GLM-4.50.0091.0098.200.003550BFree commercial
42GLM-4.5-Air0.0089.4098.100.001060BFree commercial
43OpenAI o3-mini (high)0.0087.0097.900.00—不开源
44OpenAI o10.0079.2096.400.00—不开源
45Kimi k1.5 (Long-CoT)0.000.0096.200.00—不开源
46Claude Sonnet 3.7-64K Extended Thinking0.0080.0096.200.00—不开源
47Llama 4 Behemoth Instruct0.000.0095.000.0020000BFree commercial
48Grok 3 mini0.0040.000.000.00—不开源
49GPT-4.50.0036.7090.700.00—不开源
50OpenAI o1-mini0.0063.6090.000.00—不开源
51DeepSeek-V30.0039.0087.800.006810BFree commercial
52Grok-3 mini - Reasoning0.0096.000.000.00—不开源
53Grok-3 - Reasoning Beta0.0093.300.000.00—不开源
54GPT-4.1 mini0.0049.600.000.00—不开源
55Grok 3.50.000.000.000.00—不开源
56GPT-4.1 nano0.0029.400.000.00—不开源
57Gemini 2.0 Pro Experimental0.0036.000.000.00—不开源
1
StepFun Flash 3.5
1960B
AIME202599.80
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
2
OpenAI o4 - mini
AIME202599.50
AIME 202498.70
MATH-5000.00
GSM8K0.00
不开源
3
GLM-4.6
3550B
AIME202598.60
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
4
Kimi K2.5
10000B
AIME202596.10
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
5
GLM-4.7
3580B
AIME202595.70
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
6
DeepSeek V3.2
6710B
AIME202593.10
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
7
o3-pro
AIME202593.00
AIME 202493.00
MATH-5000.00
GSM8K0.00
不开源
8
Qwen3-235B-A22B-Thinking-2507
2350B
AIME202592.30
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
9
DeepSeek-V3.1 Terminus
6710B
AIME202590.00
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
10
DeepSeek V3.2-Exp
6710B
AIME202589.30
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
11
DeepSeek-V3.1
6710B
AIME202588.40
AIME 202493.10
MATH-5000.00
GSM8K0.00
Free commercial
12
DeepSeek-R1-0528
6710B
AIME202587.50
AIME 202491.40
MATH-50098.00
GSM8K0.00
Free commercial
13
Intern-S1
2410B
AIME202586.00
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
14
Gemini-2.5-Pro-Preview-05-06
AIME202583.00
AIME 202492.00
MATH-50098.80
GSM8K0.00
不开源
15
Step3
3210B
AIME202582.90
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
16
Qwen3-235B-A22B
2350B
AIME202581.50
AIME 202485.70
MATH-50098.00
GSM8K96.40
Free commercial
17
M2.1
2300B
AIME202581.00
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
18
MiniMax M2
2300B
AIME202578.00
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
19
Grok 3
AIME202577.10
AIME 202484.20
MATH-5000.00
GSM8K0.00
不开源
20
MiniMax-M1-80k
4560B
AIME202576.90
AIME 202486.00
MATH-50096.80
GSM8K0.00
Free commercial
21
Claude Opus 4
AIME202575.50
AIME 202476.00
MATH-50098.20
GSM8K0.00
不开源
22
Kimi K2 0905
10000B
AIME202575.20
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
23
MiniMax-M1-40k
4560B
AIME202574.60
AIME 202483.30
MATH-50096.00
GSM8K0.00
Free commercial
24
Gemini 2.5 Flash
AIME202572.00
AIME 202488.00
MATH-5000.00
GSM8K0.00
不开源
25
Qwen3-235B-A22B-2507
2350B
AIME202570.30
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
26
DeepSeek-R1
6710B
AIME202570.00
AIME 202479.80
MATH-50097.30
GSM8K0.00
Free commercial
27
Magistral-Medium-2506
AIME202564.95
AIME 202473.59
MATH-5000.00
GSM8K0.00
不开源
28
Gemini 2.5 Flash-Lite
AIME202563.10
AIME 20240.00
MATH-5000.00
GSM8K0.00
不开源
29
Claude Sonnet 3.7
AIME202554.80
AIME 202423.30
MATH-50082.20
GSM8K0.00
不开源
30
Kimi K2
10000B
AIME202554.00
AIME 202469.60
MATH-50097.40
GSM8K0.00
Free commercial
31
DeepSeek-V3-0324
6710B
AIME202547.70
AIME 202459.40
MATH-50094.00
GSM8K96.30
Free commercial
32
GPT-4.1
AIME202536.70
AIME 202448.10
MATH-50092.80
GSM8K95.90
不开源
33
ERNIE-4.5-VL-424B-A47B-Base
4240B
AIME202535.10
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
34
ERNIE-4.5-300B-A47B
3000B
AIME202535.10
AIME 202454.80
MATH-50096.40
GSM8K96.60
Free commercial
35
Gemini 2.0 Flash Experimental
AIME202529.70
AIME 20240.00
MATH-5000.00
GSM8K0.00
不开源
36
Kimi K2 Thinking
10400B
AIME2025100.00
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
37
Kimi k1.5 (Short-CoT)
AIME20250.00
AIME 20240.00
MATH-50094.60
GSM8K0.00
不开源
38
Qwen2.5-Max
AIME20250.00
AIME 20240.00
MATH-5000.00
GSM8K94.50
不开源
39
Llama3.1-405B Instruct
4050B
AIME20250.00
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
40
Amazon Nova Pro
AIME20250.00
AIME 20240.00
MATH-5000.00
GSM8K0.00
不开源
41
GLM-4.5
3550B
AIME20250.00
AIME 202491.00
MATH-50098.20
GSM8K0.00
Free commercial
42
GLM-4.5-Air
1060B
AIME20250.00
AIME 202489.40
MATH-50098.10
GSM8K0.00
Free commercial
43
OpenAI o3-mini (high)
AIME20250.00
AIME 202487.00
MATH-50097.90
GSM8K0.00
不开源
44
OpenAI o1
AIME20250.00
AIME 202479.20
MATH-50096.40
GSM8K0.00
不开源
45
Kimi k1.5 (Long-CoT)
AIME20250.00
AIME 20240.00
MATH-50096.20
GSM8K0.00
不开源
46
Claude Sonnet 3.7-64K Extended Thinking
AIME20250.00
AIME 202480.00
MATH-50096.20
GSM8K0.00
不开源
47
Llama 4 Behemoth Instruct
20000B
AIME20250.00
AIME 20240.00
MATH-50095.00
GSM8K0.00
Free commercial
48
Grok 3 mini
AIME20250.00
AIME 202440.00
MATH-5000.00
GSM8K0.00
不开源
49
GPT-4.5
AIME20250.00
AIME 202436.70
MATH-50090.70
GSM8K0.00
不开源
50
OpenAI o1-mini
AIME20250.00
AIME 202463.60
MATH-50090.00
GSM8K0.00
不开源
51
DeepSeek-V3
6810B
AIME20250.00
AIME 202439.00
MATH-50087.80
GSM8K0.00
Free commercial
52
Grok-3 mini - Reasoning
AIME20250.00
AIME 202496.00
MATH-5000.00
GSM8K0.00
不开源
53
Grok-3 - Reasoning Beta
AIME20250.00
AIME 202493.30
MATH-5000.00
GSM8K0.00
不开源
54
GPT-4.1 mini
AIME20250.00
AIME 202449.60
MATH-5000.00
GSM8K0.00
不开源
55
Grok 3.5
AIME20250.00
AIME 20240.00
MATH-5000.00
GSM8K0.00
不开源
56
GPT-4.1 nano
AIME20250.00
AIME 202429.40
MATH-5000.00
GSM8K0.00
不开源
57
Gemini 2.0 Pro Experimental
AIME20250.00
AIME 202436.00
MATH-5000.00
GSM8K0.00
不开源