DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

大模型数学推理能力评测排行榜

本页面提供最新、最全面的大模型数学推理能力评测排行榜。我们通过 GSM8K、MATH、AIME 2025 等多个权威数学基准数据集,对包括 OpenAI 的 GPT-4o、Anthropic 的 Claude、阿里巴巴的 Qwen、DeepSeek-R1 等模型进行评测。

Updated on: 2025-07-20 20:56:42
AIME2025AIME 2024MATH-500GSM8K
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI
RankModelAIME2025AIME 2024MATH-500GSM8KParams (B)License
1Step 3.5 Flash99.800.000.000.001960BFree commercial
2OpenAI o4 - mini
99.50
98.70
0.00
0.00
—
不开源
3GLM-4.698.600.000.000.003550BFree commercial
4Kimi K2.596.100.000.000.0010000BFree commercial
5GLM-4.795.700.000.000.003580BFree commercial
6DeepSeek V3.293.100.000.000.006710BFree commercial
7o3-pro93.0093.000.000.00—不开源
8Qwen3-235B-A22B-Thinking-250792.300.000.000.002350BFree commercial
9DeepSeek-V3.1 Terminus90.000.000.000.006710BFree commercial
10DeepSeek V3.2-Exp89.300.000.000.006710BFree commercial
11DeepSeek-V3.188.4093.100.000.006710BFree commercial
12DeepSeek-R1-052887.5091.4098.000.006710BFree commercial
13MiniMax M2.586.300.000.000.002290BFree commercial
14Intern-S186.000.000.000.002410BFree commercial
15Gemini-2.5-Pro-Preview-05-0683.0092.0098.800.00—不开源
16Step382.900.000.000.003210BFree commercial
17Qwen3-235B-A22B81.5085.7098.0096.402350BFree commercial
18M2.181.000.000.000.002300BFree commercial
19MiniMax M278.000.000.000.002300BFree commercial
20Grok 377.1084.200.000.00—不开源
21MiniMax-M1-80k76.9086.0096.800.004560BFree commercial
22Claude Opus 475.5076.0098.200.00—不开源
23Kimi K2 090575.200.000.000.0010000BFree commercial
24MiniMax-M1-40k74.6083.3096.000.004560BFree commercial
25Gemini 2.5 Flash72.0088.000.000.00—不开源
26Qwen3-235B-A22B-250770.300.000.000.002350BFree commercial
27DeepSeek-R170.0079.8097.300.006710BFree commercial
28Magistral-Medium-250664.9573.590.000.00—不开源
29Gemini 2.5 Flash-Lite63.100.000.000.00—不开源
30Claude Sonnet 3.754.8023.3082.200.00—不开源
1
Step 3.5 Flash
1960B
AIME202599.80
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
2
OpenAI o4 - mini
AIME202599.50
AIME 202498.70
MATH-5000.00
GSM8K0.00
不开源
3
GLM-4.6
3550B
AIME202598.60
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
4
Kimi K2.5
10000B
AIME202596.10
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
5
GLM-4.7
3580B
AIME202595.70
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
6
DeepSeek V3.2
6710B
AIME202593.10
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
7
o3-pro
AIME202593.00
AIME 202493.00
MATH-5000.00
GSM8K0.00
不开源
8
Qwen3-235B-A22B-Thinking-2507
2350B
AIME202592.30
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
9
DeepSeek-V3.1 Terminus
6710B
AIME202590.00
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
10
DeepSeek V3.2-Exp
6710B
AIME202589.30
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
11
DeepSeek-V3.1
6710B
AIME202588.40
AIME 202493.10
MATH-5000.00
GSM8K0.00
Free commercial
12
DeepSeek-R1-0528
6710B
AIME202587.50
AIME 202491.40
MATH-50098.00
GSM8K0.00
Free commercial
13
MiniMax M2.5
2290B
AIME202586.30
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
14
Intern-S1
2410B
AIME202586.00
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
15
Gemini-2.5-Pro-Preview-05-06
AIME202583.00
AIME 202492.00
MATH-50098.80
GSM8K0.00
不开源
16
Step3
3210B
AIME202582.90
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
17
Qwen3-235B-A22B
2350B
AIME202581.50
AIME 202485.70
MATH-50098.00
GSM8K96.40
Free commercial
18
M2.1
2300B
AIME202581.00
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
19
MiniMax M2
2300B
AIME202578.00
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
20
Grok 3
AIME202577.10
AIME 202484.20
MATH-5000.00
GSM8K0.00
不开源
21
MiniMax-M1-80k
4560B
AIME202576.90
AIME 202486.00
MATH-50096.80
GSM8K0.00
Free commercial
22
Claude Opus 4
AIME202575.50
AIME 202476.00
MATH-50098.20
GSM8K0.00
不开源
23
Kimi K2 0905
10000B
AIME202575.20
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
24
MiniMax-M1-40k
4560B
AIME202574.60
AIME 202483.30
MATH-50096.00
GSM8K0.00
Free commercial
25
Gemini 2.5 Flash
AIME202572.00
AIME 202488.00
MATH-5000.00
GSM8K0.00
不开源
26
Qwen3-235B-A22B-2507
2350B
AIME202570.30
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
27
DeepSeek-R1
6710B
AIME202570.00
AIME 202479.80
MATH-50097.30
GSM8K0.00
Free commercial
28
Magistral-Medium-2506
AIME202564.95
AIME 202473.59
MATH-5000.00
GSM8K0.00
不开源
29
Gemini 2.5 Flash-Lite
AIME202563.10
AIME 20240.00
MATH-5000.00
GSM8K0.00
不开源
30
Claude Sonnet 3.7
AIME202554.80
AIME 202423.30
MATH-50082.20
GSM8K0.00
不开源
Showing top 30 of 58 models