DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
  1. Home
  2. /
  3. Benchmarks
  4. /
  5. GSM8K

GSM8K

Updated Apr 3, 2026·3,251 views
Current SOTA
百度
ERNIE-4.5-300B-A47B
百度
96.60Score
Problem Count
8500
Institution
Google
Category
数学推理
Metrics
Accuracy
Language
英语
Difficulty
Intermediate

Overview

一个包含 8500 道小学数学题的基准,用于评估模型的数学推理能力。

Related resources

  • View Paper
  • Get Dataset
  • Official Website

Latest GSM8K model rankings and full benchmark leaderboard

Browse the latest scores, model modes, release dates, and parameter sizes for GSM8K.

Source: DataLearnerAI

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Model Mode Legend

GSM8K Rank

RankModelLicense
百度
ERNIE-4.5-300B-A47B
Standard Mode
96.60
2025-06-30300BFree Commercial
阿里巴巴
Qwen3-235B-A22B
Standard Mode
96.40
2025-04-28235BFree Commercial
DeepSeek-AI
DeepSeek-V3-0324
Standard Mode
96.30
2025-03-24671BFree Commercial
4
华为
Pangu Embedded
Standard Mode
95.98
2025-06-307BFree Commercial
5
阿里巴巴
Qwen2.5-32B
Standard Mode
95.90
2024-09-1832BFree Commercial
6
Google Deep Mind
Gemma 3 - 27B (IT)
Standard Mode
95.90
2025-03-1227BFree Commercial
7
OpenAI
GPT-4.1
Standard Mode
95.90
2025-04-14UnknownClosed
8
Anthropic
Claude3-Opus
Standard Mode
95.00
2024-03-04UnknownClosed
9
阿里巴巴
Qwen2.5-Max
Standard Mode
94.50
2025-01-28UnknownClosed
10
腾讯AI实验室
Hunyuan-A13B-Instruct
Standard Mode
91.83
2025-06-2780BFree Commercial
11
阿里巴巴
Qwen2.5-72B
Standard Mode
91.50
2024-09-1872.7BFree Commercial
12
OpenAI
GPT-4o mini
Standard Mode
91.30
2024-07-18UnknownClosed
13
阿里巴巴
Qwen3-Next
Standard Mode
90.30
2025-09-1180BFree Commercial
14
Microsoft
Phi-4-mini-instruct (3.8B)
Standard Mode
88.60
2025-02-273.8BFree Commercial
15
阿里巴巴
Qwen2.5-7B
Standard Mode
85.40
2024-09-187BFree Commercial
16
Facebook AI研究实验室
Llama3.1-8B-Instruct
Standard Mode
82.40
2024-07-238BFree Commercial
17
阿里巴巴
Qwen2.5-3B
Standard Mode
79.10
2024-09-183BFree Commercial
18
Moonshot AI
Moonlight-16B-A3B-Instruct
Standard Mode
77.40
2025-02-2316BFree Commercial
19
Google Deep Mind
Gemma2-27B
Standard Mode
74.00
2024-05-1427BFree Commercial
20
Google Research
Gemma 2 - 9B
Standard Mode
70.70
2024-06-279BFree Commercial
21
Facebook AI研究实验室
Llama3.1-8B
Standard Mode
55.30
2024-07-238BFree Commercial
22
MistralAI
Mistral-7B-Instruct-v0.3
Standard Mode
36.20
2024-05-227BFree Commercial
23
Facebook AI研究实验室
Llama-3.2-3B
Standard Mode
34.00
2024-09-183.2BFree Commercial
24
Google Deep Mind
Gemini 1.5 Pro
Standard Mode
0.00
2024-02-15UnknownClosed
25
Facebook AI研究实验室
Llama3.1-405B Instruct
Standard Mode
0.00
2024-07-23405BFree Commercial
26
亚马逊
Amazon Nova Pro
Standard Mode
0.00
2024-12-03UnknownClosed