GSM8K
Updated Apr 3, 2026·3,251 views
- Problem Count
- 8500
- Institution
- Category
- 数学推理
- Metrics
- Accuracy
- Language
- 英语
- Difficulty
- Intermediate
Overview
一个包含 8500 道小学数学题的基准,用于评估模型的数学推理能力。
Related resources
Latest GSM8K model rankings and full benchmark leaderboard
Browse the latest scores, model modes, release dates, and parameter sizes for GSM8K.
Source: DataLearnerAI
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
Model Mode Legend
GSM8K Rank
| Rank | Model | License | |||
|---|---|---|---|---|---|
![]() ERNIE-4.5-300B-A47B Standard Mode | 96.60 | 2025-06-30 | 300B | Free Commercial | |
![]() Qwen3-235B-A22B Standard Mode | 96.40 | 2025-04-28 | 235B | Free Commercial | |
![]() DeepSeek-V3-0324 Standard Mode | 96.30 | 2025-03-24 | 671B | Free Commercial | |
4 | ![]() Pangu Embedded Standard Mode | 95.98 | 2025-06-30 | 7B | Free Commercial |
5 | ![]() Qwen2.5-32B Standard Mode | 95.90 | 2024-09-18 | 32B | Free Commercial |
6 | ![]() Gemma 3 - 27B (IT) Standard Mode | 95.90 | 2025-03-12 | 27B | Free Commercial |
7 | ![]() GPT-4.1 Standard Mode | 95.90 | 2025-04-14 | Unknown | Closed |
8 | ![]() Claude3-Opus Standard Mode | 95.00 | 2024-03-04 | Unknown | Closed |
9 | ![]() Qwen2.5-Max Standard Mode | 94.50 | 2025-01-28 | Unknown | Closed |
10 | ![]() Hunyuan-A13B-Instruct Standard Mode | 91.83 | 2025-06-27 | 80B | Free Commercial |
11 | ![]() Qwen2.5-72B Standard Mode | 91.50 | 2024-09-18 | 72.7B | Free Commercial |
12 | ![]() GPT-4o mini Standard Mode | 91.30 | 2024-07-18 | Unknown | Closed |
13 | ![]() Qwen3-Next Standard Mode | 90.30 | 2025-09-11 | 80B | Free Commercial |
14 | ![]() Phi-4-mini-instruct (3.8B) Standard Mode | 88.60 | 2025-02-27 | 3.8B | Free Commercial |
15 | ![]() Qwen2.5-7B Standard Mode | 85.40 | 2024-09-18 | 7B | Free Commercial |
16 | ![]() Llama3.1-8B-Instruct Standard Mode | 82.40 | 2024-07-23 | 8B | Free Commercial |
17 | ![]() Qwen2.5-3B Standard Mode | 79.10 | 2024-09-18 | 3B | Free Commercial |
18 | ![]() Moonlight-16B-A3B-Instruct Standard Mode | 77.40 | 2025-02-23 | 16B | Free Commercial |
19 | ![]() Gemma2-27B Standard Mode | 74.00 | 2024-05-14 | 27B | Free Commercial |
20 | ![]() Gemma 2 - 9B Standard Mode | 70.70 | 2024-06-27 | 9B | Free Commercial |
21 | ![]() Llama3.1-8B Standard Mode | 55.30 | 2024-07-23 | 8B | Free Commercial |
22 | ![]() Mistral-7B-Instruct-v0.3 Standard Mode | 36.20 | 2024-05-22 | 7B | Free Commercial |
23 | ![]() Llama-3.2-3B Standard Mode | 34.00 | 2024-09-18 | 3.2B | Free Commercial |
24 | ![]() Gemini 1.5 Pro Standard Mode | 0.00 | 2024-02-15 | Unknown | Closed |
25 | ![]() Llama3.1-405B Instruct Standard Mode | 0.00 | 2024-07-23 | 405B | Free Commercial |
26 | ![]() Amazon Nova Pro Standard Mode | 0.00 | 2024-12-03 | Unknown | Closed |













