DataLearner logoDataLearnerAI
AI Tech Blogs
Leaderboards
Benchmarks
Models
Resources
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款
Back to Main Leaderboard

大模型数学推理能力评测排行榜

本页面提供最新、最全面的大模型数学推理能力评测排行榜。我们通过 GSM8K、MATH、AIME 2025 等多个权威数学基准数据集,对包括 OpenAI 的 GPT-4o、Anthropic 的 Claude、阿里巴巴的 Qwen、DeepSeek-R1 等模型进行评测。

Updated on: 2025-07-20 20:56:42

Benchmark switcher

Pick the leaderboard to sync both chart and table

AIME2025AIME 2024MATH-500GSM8K

Filters

Active
All3B and below7B13B34B65B100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI

LLM Performance Results

Data source: DataLearnerAI

AIME2025
RankModelAIME2025AIME 2024MATH-500GSM8KParams (B)License
1Qwen3-4B-Thinking-250781.300.000.000.0040BFree commercial
2Hunyuan-7B75.3081.1093.700.0070BFree commercial
3Qwen3-8B67.3079.4097.400.0080BFree commercial
4Qwen3-4B-250747.400.000.000.0040BFree commercial
5Pangu Embedded0.0081.9092.4095.9870BFree commercial
6Qwen2.5-7B0.000.000.0085.4070BFree commercial
7Llama3.1-8B-Instruct0.000.000.0082.4080BFree commercial
8Gemma 2 - 9B0.000.000.0070.7090BFree commercial
9Llama3.1-8B0.000.000.0055.3080BFree commercial
10Mistral-7B-Instruct-v0.30.000.000.0036.2070BFree commercial
11DeepSeek-R1-Distill-Qwen-7B0.0053.3091.400.0070BFree commercial
12GLM-4-9B-Chat0.0076.400.000.0090BFree commercial
1
Qwen3-4B-Thinking-2507
40B
AIME202581.30
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
2
Hunyuan-7B
70B
AIME202575.30
AIME 202481.10
MATH-50093.70
GSM8K0.00
Free commercial
3
Qwen3-8B
80B
AIME202567.30
AIME 202479.40
MATH-50097.40
GSM8K0.00
Free commercial
4
Qwen3-4B-2507
40B
AIME202547.40
AIME 20240.00
MATH-5000.00
GSM8K0.00
Free commercial
5
Pangu Embedded
70B
AIME20250.00
AIME 202481.90
MATH-50092.40
GSM8K95.98
Free commercial
6
Qwen2.5-7B
70B
AIME20250.00
AIME 20240.00
MATH-5000.00
GSM8K85.40
Free commercial
7
Llama3.1-8B-Instruct
80B
AIME20250.00
AIME 20240.00
MATH-5000.00
GSM8K82.40
Free commercial
8
Gemma 2 - 9B
90B
AIME20250.00
AIME 20240.00
MATH-5000.00
GSM8K70.70
Free commercial
9
Llama3.1-8B
80B
AIME20250.00
AIME 20240.00
MATH-5000.00
GSM8K55.30
Free commercial
10
Mistral-7B-Instruct-v0.3
70B
AIME20250.00
AIME 20240.00
MATH-5000.00
GSM8K36.20
Free commercial
11
DeepSeek-R1-Distill-Qwen-7B
70B
AIME20250.00
AIME 202453.30
MATH-50091.40
GSM8K0.00
Free commercial
12
GLM-4-9B-Chat
90B
AIME20250.00
AIME 202476.40
MATH-5000.00
GSM8K0.00
Free commercial