加载中...
加载中...
本页面提供最新、最全面的大模型数学推理能力评测排行榜。我们通过 GSM8K、MATH、AIME 2025 等多个权威数学基准数据集,对包括 GPT-4o、Claude、Qwen、DeepSeek-R1 等模型进行评测。
评测切换
在这里切换评测,图表和表格会同步更新
数据来源:DataLearnerAI
| 排名 | 模型 | AIME2025 | AIME 2024 | MATH-500 | GSM8K |
|---|---|---|---|---|---|
| 1 | Gemini 3.0 Flashthinking + 使用工具 | 99.70 | 0.00 | 0.00 | 0.00 |
| 2 | GPT-5thinking + 使用工具 | 99.60 | 0.00 | 0.00 | 0.00 |
| 3 | OpenAI o4 - minithinking + 使用工具 | 99.50 | 98.70 | 0.00 | 0.00 |
| 4 | Gemini 2.5 Deep Thinkdeeper_thinking | 99.20 | 0.00 | 0.00 | 0.00 |
| 5 | Kimi K2 Thinkingthinking + 使用工具 | 99.10 | 0.00 | 0.00 | 0.00 |
| 6 | Grok 4thinking + 使用工具 | 98.80 | 0.00 | 0.00 | 0.00 |
| 7 | GPT OSS 20Bthinking + 使用工具 | 98.70 | 96.00 | 0.00 | 0.00 |
| 8 | GLM-4.6thinking | 98.60 | 0.00 | 0.00 | 0.00 |
| 9 | GLM-4.6thinking + 使用工具 | 98.60 | 0.00 | 0.00 | 0.00 |
| 10 | GPT OSS 120Bthinking + 使用工具 | 97.90 | 96.60 | 0.00 | 0.00 |
| 11 | GPT-5-Prothinking | 96.70 | 0.00 | 0.00 | 0.00 |
| 12 | Haiku 4.5thinking + 使用工具 | 96.30 | 0.00 | 0.00 | 0.00 |
| 13 | DeepSeek V3.2 Specialethinking | 96.00 | 0.00 | 0.00 | 0.00 |
| 14 | Gemini 3.0 Flashthinking | 95.20 | 0.00 | 0.00 | 0.00 |
| 15 | Gemini 3.0 Pro (Preview 11-2025)thinking | 95.00 | 0.00 | 0.00 | 0.00 |
| 16 | GPT-5thinking | 94.60 | 0.00 | 0.00 | 0.00 |
| 17 | Kimi K2 Thinkingthinking | 94.50 | 0.00 | 0.00 | 0.00 |
| 18 | GPT-5.1high | 94.00 | 0.00 | 0.00 | 0.00 |
| 19 | DeepSeek V3.2thinking | 93.10 | 0.00 | 0.00 | 0.00 |
| 20 | 93.00 | 93.00 | 0.00 | 0.00 | |
| 21 | OpenAI o4 - minithinking | 92.70 | 93.40 | 0.00 | 0.00 |
| 22 | Qwen3-235B-A22B-Thinking-2507thinking | 92.30 | 0.00 | 0.00 | 0.00 |
| 23 | Qwen3-235B-A22B-Thinkingthinking | 92.30 | 0.00 | 0.00 | 0.00 |
| 24 | Grok 4 Fastthinking | 92.00 | 0.00 | 0.00 | 0.00 |
| 25 | Grok 4thinking | 91.70 | 0.00 | 0.00 | 0.00 |
| 26 | DeepSeek-V3.1 Terminusthinking | 90.00 | 0.00 | 0.00 | 0.00 |
| 27 | DeepSeek V3.2-Expthinking | 89.30 | 0.00 | 0.00 | 0.00 |
| 28 | Grok 4.1 Fastthinking | 89.00 | 0.00 | 0.00 | 0.00 |
| 29 | OpenAI o3thinking | 88.90 | 0.00 | 0.00 | 0.00 |
| 30 | DeepSeek-V3.1thinking | 88.40 | 93.10 | 0.00 | 0.00 |
| 31 | Gemini 2.5-Prothinking | 88.00 | 0.00 | 0.00 | 0.00 |
| 32 | DeepSeek-R1-0528thinking | 87.50 | 91.40 | 98.00 | 0.00 |
| 33 | Claude Sonnet 4.5thinking | 87.00 | 0.00 | 0.00 | 0.00 |
| 34 | 86.90 | 92.00 | 0.00 | 0.00 | |
| 35 | OpenAI o3-minithinking | 86.50 | 60.00 | 95.80 | 0.00 |
| 36 | 86.00 | 0.00 | 0.00 | 0.00 | |
| 37 | Claude Sonnet 4deeper_thinking + 使用工具 | 85.00 | 0.00 | 0.00 | 0.00 |
| 38 | 83.00 | 92.00 | 98.80 | 0.00 | |
| 39 | GPT OSS 120Bthinking | 83.00 | 0.00 | 0.00 | 0.00 |
| 40 | 82.90 | 0.00 | 0.00 | 0.00 | |
| 41 | Qwen3-235B-A22Bthinking | 81.50 | 85.70 | 98.00 | 0.00 |
| 42 | Qwen3-4B-Thinking-2507thinking | 81.30 | 0.00 | 0.00 | 0.00 |
| 43 | Haiku 4.5thinking | 80.70 | 0.00 | 0.00 | 0.00 |
| 44 | 80.60 | 0.00 | 0.00 | 0.00 | |
| 45 | GPT OSS 20Bthinking | 79.00 | 0.00 | 0.00 | 0.00 |
| 46 | Claude Opus 4.1thinking + 使用工具 | 78.00 | 0.00 | 0.00 | 0.00 |
| 47 | MiniMax M2thinking | 78.00 | 0.00 | 0.00 | 0.00 |
| 48 | Claude Opus 4.1thinking | 78.00 | 0.00 | 0.00 | 0.00 |
| 49 | 77.10 | 84.20 | 0.00 | 0.00 | |
| 50 | 76.90 | 86.00 | 96.80 | 0.00 | |
| 51 | 76.80 | 87.30 | 0.00 | 91.83 | |
| 52 | 75.50 | 76.00 | 98.20 | 0.00 | |
| 53 | 75.30 | 81.10 | 93.70 | 0.00 | |
| 54 | Kimi K2 0905thinking + 使用工具 | 75.20 | 0.00 | 0.00 | 0.00 |
| 55 | 74.60 | 83.30 | 96.00 | 0.00 | |
| 56 | Qwen3-32Bthinking | 72.90 | 81.40 | 97.20 | 0.00 |
| 57 | 72.90 | 81.40 | 0.00 | 0.00 | |
| 58 | Gemini 2.5 Flashthinking | 72.00 | 0.00 | 0.00 | 0.00 |
| 59 | Claude Sonnet 4thinking | 70.50 | 0.00 | 0.00 | 0.00 |
| 60 | 70.30 | 0.00 | 0.00 | 0.00 | |
| 61 | 70.00 | 79.80 | 97.30 | 0.00 | |
| 62 | 69.50 | 0.00 | 0.00 | 90.30 | |
| 63 | 68.10 | 79.20 | 96.80 | 0.00 | |
| 64 | Qwen3-8Bthinking | 67.30 | 76.00 | 97.40 | 0.00 |
| 65 | 64.95 | 73.59 | 0.00 | 0.00 | |
| 66 | 63.10 | 0.00 | 0.00 | 0.00 | |
| 67 | 62.76 | 70.68 | 0.00 | 0.00 | |
| 68 | 61.90 | 0.00 | 0.00 | 0.00 | |
| 69 | 61.60 | 88.00 | 0.00 | 0.00 | |
| 70 | 61.30 | 0.00 | 0.00 | 0.00 | |
| 71 | 58.00 | 0.00 | 0.00 | 0.00 | |
| 72 | 54.80 | 23.30 | 82.20 | 0.00 | |
| 73 | 54.00 | 0.00 | 0.00 | 0.00 | |
| 74 | 54.00 | 69.60 | 97.40 | 0.00 | |
| 75 | 49.80 | 66.30 | 0.00 | 0.00 | |
| 76 | 47.70 | 59.40 | 94.00 | 96.30 | |
| 77 | 47.40 | 0.00 | 0.00 | 0.00 | |
| 78 | 47.00 | 0.00 | 0.00 | 0.00 | |
| 79 | GPT-5-minithinking | 47.00 | 0.00 | 0.00 | 0.00 |
| 80 | 44.00 | 0.00 | 0.00 | 0.00 | |
| 81 | GPT-4onormal + 使用工具 | 42.10 | 0.00 | 0.00 | 0.00 |
| 82 | 39.00 | 0.00 | 0.00 | 0.00 | |
| 83 | 38.00 | 43.40 | 0.00 | 0.00 | |
| 84 | 37.00 | 0.00 | 0.00 | 0.00 | |
| 85 | 36.70 | 48.10 | 92.80 | 95.90 | |
| 86 | ERNIE-4.5-VL-424B-A47B-Basethinking | 35.10 | 0.00 | 0.00 | 0.00 |
| 87 | 35.10 | 54.80 | 96.40 | 96.60 | |
| 88 | 29.70 | 0.00 | 0.00 | 0.00 | |
| 89 | 26.70 | 0.00 | 0.00 | 0.00 | |
| 90 | 24.70 | 85.70 | 96.20 | 96.40 | |
| 91 | 21.60 | 0.00 | 0.00 | 0.00 | |
| 92 | 20.90 | 79.40 | 87.40 | 0.00 | |
| 93 | Kimi K2 Thinkingparallel_thinking + 使用工具 | 100.00 | 0.00 | 0.00 | 0.00 |
| 94 | Claude Sonnet 4.5thinking + 使用工具 | 100.00 | 0.00 | 0.00 | 0.00 |
| 95 | GPT-5-Prothinking + 使用工具 | 100.00 | 0.00 | 0.00 | 0.00 |
| 96 | Grok 4 Heavyparallel_thinking | 100.00 | 0.00 | 0.00 | 0.00 |
| 97 | GPT-5.2thinking | 100.00 | 0.00 | 0.00 | 0.00 |
| 98 | 0.00 | 0.00 | 0.00 | 36.20 | |
| 99 | 0.00 | 79.20 | 96.40 | 0.00 | |
| 100 | 0.00 | 0.00 | 0.00 | 34.00 | |
| 101 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 102 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 103 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 104 | 0.00 | 92.00 | 98.80 | 0.00 | |
| 105 | GLM-4.5thinking | 0.00 | 91.00 | 98.20 | 0.00 |
| 106 | 0.00 | 91.60 | 98.10 | 0.00 | |
| 107 | GLM-4.5-Airthinking | 0.00 | 89.40 | 98.10 | 0.00 |
| 108 | 0.00 | 87.00 | 97.90 | 0.00 | |
| 109 | 0.00 | 81.90 | 92.40 | 95.98 | |
| 110 | 0.00 | 0.00 | 0.00 | 55.30 | |
| 111 | 0.00 | 0.00 | 0.00 | 70.70 | |
| 112 | 0.00 | 0.00 | 0.00 | 77.40 | |
| 113 | 0.00 | 0.00 | 0.00 | 79.10 | |
| 114 | 0.00 | 0.00 | 0.00 | 82.40 | |
| 115 | 0.00 | 0.00 | 0.00 | 85.40 | |
| 116 | 0.00 | 10.00 | 71.80 | 88.60 | |
| 117 | 0.00 | 0.00 | 0.00 | 91.30 | |
| 118 | 0.00 | 0.00 | 0.00 | 91.50 | |
| 119 | 0.00 | 0.00 | 0.00 | 94.50 | |
| 120 | 0.00 | 0.00 | 0.00 | 95.00 | |
| 121 | 0.00 | 0.00 | 0.00 | 95.90 | |
| 122 | 0.00 | 39.00 | 87.80 | 0.00 | |
| 123 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 124 | 0.00 | 25.30 | 0.00 | 0.00 | |
| 125 | 0.00 | 29.40 | 0.00 | 0.00 | |
| 126 | 0.00 | 36.00 | 0.00 | 0.00 | |
| 127 | 0.00 | 40.00 | 0.00 | 0.00 | |
| 128 | 0.00 | 49.60 | 0.00 | 0.00 | |
| 129 | 0.00 | 76.40 | 0.00 | 0.00 | |
| 130 | 0.00 | 93.30 | 0.00 | 0.00 | |
| 131 | 0.00 | 96.00 | 0.00 | 0.00 | |
| 132 | 0.00 | 9.30 | 75.90 | 0.00 | |
| 133 | 0.00 | 16.00 | 78.00 | 0.00 | |
| 134 | 0.00 | 0.00 | 96.20 | 0.00 | |
| 135 | 0.00 | 63.60 | 90.00 | 0.00 | |
| 136 | 0.00 | 50.00 | 90.40 | 0.00 | |
| 137 | 0.00 | 50.00 | 90.60 | 0.00 | |
| 138 | 0.00 | 36.70 | 90.70 | 0.00 | |
| 139 | 0.00 | 79.50 | 91.00 | 0.00 | |
| 140 | 0.00 | 53.30 | 91.40 | 0.00 | |
| 141 | 0.00 | 0.00 | 94.50 | 0.00 | |
| 142 | 0.00 | 0.00 | 94.60 | 0.00 | |
| 143 | 0.00 | 0.00 | 95.00 | 0.00 | |
| 144 | 0.00 | 78.20 | 96.20 | 0.00 | |
| 145 | 0.00 | 80.00 | 96.20 | 0.00 |