加载中...
加载中...
快速查看大模型在各项评测基准上的表现,包括MMLU Pro、HLE、SWE-Bench等多个标准数据集,帮助开发者和用户了解不同大模型在通用知识、编程能力、推理能力等方面的表现。用户可以选择自定义模型与评测基准进行对比,快速获取不同模型在实际应用中的优劣势。
各个评测基准的详细介绍可见:LLM 评测基准列表与介绍
评测切换
在这里切换评测,图表和表格会同步更新
数据来源:DataLearnerAI
| 排名 | 模型 | MMLU Pro | GPQA Diamond | SWE-bench Verified | MATH-500 | AIME 2024 | LiveCodeBench |
|---|---|---|---|---|---|---|---|
| 1 | 91.04 | 77.30 | 48.90 | 96.40 | 79.20 | 71.00 | |
| 2 | Gemini 3.0 Pro (Preview 11-2025)thinking | 90.00 | 91.90 | 76.20 | 0.00 | 0.00 | 92.00 |
| 3 | Claude Opus 4.5thinking | 90.00 | 87.00 | 80.90 | 0.00 | 0.00 | 0.00 |
| 4 | Claude Opus 4.1thinking | 88.00 | 81.00 | 74.50 | 0.00 | 0.00 | 0.00 |
| 5 | Claude Sonnet 4.5thinking | 88.00 | 83.40 | 0.00 | 0.00 | 0.00 | 71.00 |
| 6 | 87.20 | 69.30 | 0.00 | 96.20 | 78.20 | 64.90 | |
| 7 | Grok 4thinking | 87.00 | 87.00 | 58.60 | 0.00 | 0.00 | 82.00 |
| 8 | 86.10 | 71.40 | 38.00 | 90.70 | 36.70 | 46.40 | |
| 9 | 86.00 | 0.00 | 0.00 | 98.80 | 92.00 | 77.10 | |
| 10 | 85.60 | 0.00 | 0.00 | 98.10 | 91.60 | 75.80 | |
| 11 | 85.00 | 79.60 | 72.50 | 98.20 | 76.00 | 56.60 | |
| 12 | DeepSeek-R1-0528thinking | 85.00 | 81.00 | 57.60 | 98.00 | 91.40 | 73.30 |
| 13 | DeepSeek-V3.1thinking | 85.00 | 80.10 | 0.00 | 0.00 | 93.10 | 74.80 |
| 14 | DeepSeek-V3.1 Terminusthinking | 85.00 | 79.00 | 0.00 | 0.00 | 0.00 | 80.00 |
| 15 | 85.00 | 80.70 | 68.40 | 0.00 | 0.00 | 74.90 | |
| 16 | DeepSeek V3.2-Expthinking | 85.00 | 79.90 | 0.00 | 0.00 | 0.00 | 74.10 |
| 17 | Grok 4.1 Fastthinking | 85.00 | 85.00 | 0.00 | 0.00 | 0.00 | 82.00 |
| 18 | GLM-4.5thinking | 84.60 | 79.10 | 64.20 | 98.20 | 91.00 | 72.90 |
| 19 | Kimi K2 Thinkingthinking | 84.60 | 84.50 | 0.00 | 0.00 | 0.00 | 83.10 |
| 20 | Qwen3-235B-A22B-Thinking-2507thinking | 84.40 | 81.10 | 0.00 | 0.00 | 0.00 | 74.10 |
| 21 | Qwen3-235B-A22B-Thinkingthinking | 84.40 | 81.10 | 0.00 | 0.00 | 0.00 | 74.10 |
| 22 | 84.00 | 71.50 | 49.20 | 97.30 | 79.80 | 65.90 | |
| 23 | Claude Sonnet 4thinking | 84.00 | 75.40 | 0.00 | 0.00 | 0.00 | 66.00 |
| 24 | 84.00 | 76.00 | 69.60 | 0.00 | 0.00 | 57.50 | |
| 25 | 84.00 | 74.00 | 0.00 | 0.00 | 0.00 | 55.00 | |
| 26 | 83.70 | 74.90 | 66.00 | 0.00 | 66.30 | 56.40 | |
| 27 | 83.50 | 77.30 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 28 | 83.00 | 77.50 | 0.00 | 0.00 | 0.00 | 51.80 | |
| 29 | GLM-4.6thinking | 83.00 | 81.00 | 0.00 | 0.00 | 0.00 | 82.80 |
| 30 | 82.60 | 73.70 | 0.00 | 96.80 | 79.20 | 59.60 | |
| 31 | 82.20 | 73.70 | 0.00 | 95.00 | 0.00 | 49.40 | |
| 32 | MiniMax M2thinking | 82.00 | 78.00 | 0.00 | 0.00 | 0.00 | 83.00 |
| 33 | GLM-4.5-Airthinking | 81.40 | 75.00 | 57.60 | 98.10 | 89.40 | 70.70 |
| 34 | 81.20 | 68.40 | 38.80 | 94.00 | 59.40 | 49.20 | |
| 35 | 81.10 | 70.00 | 56.00 | 96.80 | 86.00 | 65.00 | |
| 36 | 81.10 | 75.10 | 51.80 | 97.40 | 69.60 | 53.70 | |
| 37 | OpenAI o4 - minithinking | 80.60 | 81.40 | 68.10 | 0.00 | 93.40 | 0.00 |
| 38 | 80.60 | 69.20 | 55.60 | 96.00 | 83.30 | 62.30 | |
| 39 | 80.50 | 66.30 | 54.60 | 92.80 | 48.10 | 40.50 | |
| 40 | 80.50 | 69.80 | 0.00 | 0.00 | 0.00 | 43.40 | |
| 41 | 80.30 | 60.00 | 0.00 | 90.00 | 63.60 | 52.00 | |
| 42 | 80.00 | 60.50 | 60.60 | 0.00 | 0.00 | 51.00 | |
| 43 | 79.80 | 66.90 | 0.00 | 0.00 | 0.00 | 35.80 | |
| 44 | 79.10 | 64.70 | 0.00 | 0.00 | 36.00 | 0.00 | |
| 45 | 79.00 | 57.50 | 0.00 | 0.00 | 0.00 | 32.00 | |
| 46 | 79.00 | 0.00 | 0.00 | 92.40 | 81.90 | 67.10 | |
| 47 | GPT OSS 120Bthinking | 79.00 | 80.10 | 60.10 | 0.00 | 0.00 | 0.00 |
| 48 | 78.40 | 0.00 | 0.00 | 96.40 | 54.80 | 38.80 | |
| 49 | 78.40 | 70.40 | 0.00 | 0.00 | 0.00 | 43.20 | |
| 50 | 78.00 | 63.00 | 68.00 | 0.00 | 0.00 | 56.00 | |
| 51 | 78.00 | 65.00 | 49.00 | 78.00 | 16.00 | 38.70 | |
| 52 | GPT-5-minithinking | 78.00 | 69.00 | 0.00 | 0.00 | 0.00 | 55.00 |
| 53 | 77.90 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 54 | 77.90 | 70.10 | 31.00 | 75.90 | 9.30 | 35.10 | |
| 55 | 77.64 | 59.40 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 56 | 76.24 | 65.20 | 21.40 | 0.00 | 0.00 | 29.10 | |
| 57 | 76.10 | 53.50 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 58 | 76.10 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 59 | 76.00 | 58.00 | 0.00 | 91.00 | 79.50 | 0.00 | |
| 60 | Haiku 4.5thinking | 76.00 | 73.30 | 0.00 | 0.00 | 0.00 | 62.00 |
| 61 | 75.90 | 59.10 | 0.00 | 87.80 | 39.00 | 34.60 | |
| 62 | 75.50 | 56.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 63 | 74.30 | 57.20 | 0.00 | 0.00 | 0.00 | 32.80 | |
| 64 | GPT OSS 20Bthinking | 74.00 | 71.50 | 0.00 | 0.00 | 0.00 | 0.00 |
| 65 | 73.40 | 49.00 | 0.00 | 0.00 | 0.00 | 30.20 | |
| 66 | 72.90 | 71.10 | 34.40 | 96.20 | 85.70 | 70.70 | |
| 67 | 72.50 | 39.30 | 0.00 | 87.40 | 79.40 | 61.80 | |
| 68 | 72.40 | 0.00 | 0.00 | 0.00 | 76.40 | 51.80 | |
| 69 | 71.60 | 51.50 | 0.00 | 0.00 | 0.00 | 28.90 | |
| 70 | 70.97 | 0.00 | 0.00 | 90.60 | 50.00 | 0.00 | |
| 71 | 70.40 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 72 | 69.23 | 0.00 | 0.00 | 0.00 | 0.00 | 51.20 | |
| 73 | 69.10 | 54.80 | 0.00 | 0.00 | 0.00 | 29.00 | |
| 74 | 69.06 | 46.13 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 75 | 68.90 | 50.50 | 0.00 | 0.00 | 0.00 | 33.30 | |
| 76 | 68.45 | 50.40 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 77 | 67.50 | 42.40 | 0.00 | 0.00 | 25.30 | 29.70 | |
| 78 | 67.23 | 71.20 | 0.00 | 0.00 | 87.30 | 63.90 | |
| 79 | 66.76 | 45.96 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 80 | 66.40 | 48.00 | 0.00 | 0.00 | 0.00 | 33.30 | |
| 81 | 66.05 | 0.00 | 0.00 | 0.00 | 0.00 | 56.60 | |
| 82 | 65.00 | 41.60 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 83 | 63.69 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 84 | 62.90 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 85 | 61.70 | 41.10 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 86 | 61.60 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 87 | 60.60 | 40.90 | 0.00 | 0.00 | 0.00 | 24.60 | |
| 88 | 58.20 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 89 | 58.10 | 45.90 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 90 | 56.80 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 91 | 56.54 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 92 | 56.33 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 93 | 56.20 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 94 | 52.80 | 36.00 | 0.00 | 71.80 | 10.00 | 0.00 | |
| 95 | 52.78 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 96 | 52.47 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 97 | 51.00 | 35.90 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 98 | 47.16 | 33.84 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 99 | 45.00 | 36.40 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 100 | 44.70 | 32.80 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 101 | 44.00 | 26.30 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 102 | 42.40 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 103 | 35.40 | 25.80 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 104 | 34.60 | 24.30 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 105 | 30.90 | 24.70 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 106 | 25.00 | 26.60 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 107 | GPT-5.1-Codex-Maxhigh + 使用工具 | 0.00 | 0.00 | 76.80 | 0.00 | 0.00 | 0.00 |
| 108 | GPT-5.1 Codexhigh + 使用工具 | 0.00 | 0.00 | 70.40 | 0.00 | 0.00 | 85.50 |
| 109 | o3-prohigh | 0.00 | 0.00 | 75.00 | 0.00 | 0.00 | 0.00 |
| 110 | GPT-5 Codexhigh | 0.00 | 0.00 | 74.50 | 0.00 | 0.00 | 0.00 |
| 111 | Grok 4 Heavyparallel_thinking + 使用工具 | 0.00 | 0.00 | 73.50 | 0.00 | 0.00 | 0.00 |
| 112 | Haiku 4.5thinking + 使用工具 | 0.00 | 0.00 | 73.30 | 0.00 | 0.00 | 0.00 |
| 113 | DeepSeek V3.2thinking + 使用工具 | 0.00 | 0.00 | 73.10 | 0.00 | 0.00 | 0.00 |
| 114 | Claude Sonnet 4thinking + 使用工具 | 0.00 | 0.00 | 72.70 | 0.00 | 0.00 | 0.00 |
| 115 | 0.00 | 0.00 | 72.00 | 0.00 | 0.00 | 0.00 | |
| 116 | Kimi K2 Thinkingthinking + 使用工具 | 0.00 | 0.00 | 71.30 | 0.00 | 0.00 | 0.00 |
| 117 | Grok Code Fast 1thinking | 0.00 | 0.00 | 70.80 | 0.00 | 0.00 | 0.00 |
| 118 | 0.00 | 60.10 | 0.00 | 93.70 | 81.10 | 57.00 | |
| 119 | Claude Sonnet 4.5thinking + 使用工具 | 0.00 | 0.00 | 77.20 | 0.00 | 0.00 | 0.00 |
| 120 | Claude Opus 4.1parallel_thinking + 使用工具 | 0.00 | 0.00 | 79.40 | 0.00 | 0.00 | 0.00 |
| 121 | Claude Sonnet 4parallel_thinking + 使用工具 | 0.00 | 0.00 | 80.20 | 0.00 | 0.00 | 0.00 |
| 122 | Claude Sonnet 4.5parallel_thinking + 使用工具 | 0.00 | 0.00 | 82.00 | 0.00 | 0.00 | 0.00 |
| 123 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 124 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 125 | 0.00 | 49.00 | 0.00 | 90.40 | 50.00 | 0.00 | |
| 126 | 0.00 | 49.50 | 0.00 | 91.40 | 53.30 | 0.00 | |
| 127 | 0.00 | 50.30 | 0.00 | 0.00 | 29.40 | 0.00 | |
| 128 | 0.00 | 53.30 | 0.00 | 0.00 | 81.40 | 65.70 | |
| 129 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 31.50 | |
| 130 | 0.00 | 0.00 | 0.00 | 94.60 | 0.00 | 0.00 | |
| 131 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 37.90 | |
| 132 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 65.60 | |
| 133 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 65.90 | |
| 134 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 67.40 | |
| 135 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 73.80 | |
| 136 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 80.40 | |
| 137 | Claude Opus 4.5thinking + 使用工具 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 87.00 |
| 138 | Gemini 2.5 Deep Thinkdeeper_thinking | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 87.60 |
| 139 | GPT OSS 20Bthinking + 使用工具 | 0.00 | 0.00 | 0.00 | 0.00 | 96.00 | 0.00 |
| 140 | GPT OSS 120Bthinking + 使用工具 | 0.00 | 0.00 | 0.00 | 0.00 | 96.60 | 0.00 |
| 141 | OpenAI o4 - minithinking + 使用工具 | 0.00 | 0.00 | 0.00 | 0.00 | 98.70 | 0.00 |
| 142 | MiniMax M2thinking + 使用工具 | 0.00 | 0.00 | 69.40 | 0.00 | 0.00 | 0.00 |
| 143 | 0.00 | 0.00 | 0.00 | 96.20 | 0.00 | 0.00 | |
| 144 | 0.00 | 0.00 | 46.80 | 0.00 | 0.00 | 0.00 | |
| 145 | 0.00 | 0.00 | 51.60 | 0.00 | 0.00 | 0.00 | |
| 146 | 0.00 | 0.00 | 53.60 | 0.00 | 0.00 | 0.00 | |
| 147 | Gemini 2.5 Flash-Preview-09-2025thinking | 0.00 | 0.00 | 54.00 | 0.00 | 0.00 | 0.00 |
| 148 | 0.00 | 0.00 | 61.60 | 0.00 | 0.00 | 0.00 | |
| 149 | 0.00 | 0.00 | 67.00 | 0.00 | 0.00 | 0.00 | |
| 150 | DeepSeek V3.2-Expthinking + 使用工具 | 0.00 | 0.00 | 67.80 | 0.00 | 0.00 | 0.00 |
| 151 | Kimi K2 0905thinking + 使用工具 | 0.00 | 0.00 | 69.20 | 0.00 | 0.00 | 0.00 |
| 152 | 0.00 | 0.00 | 69.20 | 0.00 | 0.00 | 0.00 | |
| 153 | Gemini 2.5-Prothinking | 0.00 | 86.40 | 67.20 | 0.00 | 0.00 | 0.00 |
| 154 | GLM-4.6thinking + 使用工具 | 0.00 | 82.90 | 68.00 | 0.00 | 0.00 | 84.50 |
| 155 | 0.00 | 83.00 | 63.20 | 98.80 | 92.00 | 77.10 | |
| 156 | OpenAI o3thinking | 0.00 | 83.30 | 69.10 | 0.00 | 0.00 | 0.00 |
| 157 | Claude Sonnet 4deeper_thinking + 使用工具 | 0.00 | 83.80 | 0.00 | 0.00 | 0.00 | 0.00 |
| 158 | 0.00 | 84.00 | 0.00 | 0.00 | 93.00 | 0.00 | |
| 159 | 0.00 | 84.00 | 63.80 | 0.00 | 92.00 | 70.40 | |
| 160 | 0.00 | 84.00 | 0.00 | 0.00 | 96.00 | 0.00 | |
| 161 | 0.00 | 84.60 | 0.00 | 0.00 | 93.30 | 79.40 | |
| 162 | 0.00 | 84.80 | 0.00 | 96.20 | 80.00 | 0.00 | |
| 163 | Grok 4 Fastthinking | 0.00 | 85.70 | 0.00 | 0.00 | 0.00 | 80.00 |
| 164 | GPT-5high | 0.00 | 85.70 | 72.80 | 0.00 | 0.00 | 0.00 |
| 165 | Gemini 2.5 Flashthinking | 0.00 | 82.80 | 48.90 | 0.00 | 0.00 | 55.40 |
| 166 | GPT-5thinking + 使用工具 | 0.00 | 87.30 | 0.00 | 0.00 | 0.00 | 0.00 |
| 167 | GPT-5.1high | 0.00 | 88.10 | 76.30 | 0.00 | 0.00 | 0.00 |
| 168 | GPT-5.1thinking | 0.00 | 88.10 | 0.00 | 0.00 | 0.00 | 0.00 |
| 169 | GPT-5-Prothinking | 0.00 | 88.40 | 0.00 | 0.00 | 0.00 | 0.00 |
| 170 | Grok 4 Heavyparallel_thinking | 0.00 | 88.90 | 0.00 | 0.00 | 0.00 | 0.00 |
| 171 | GPT-5-Prothinking + 使用工具 | 0.00 | 89.40 | 0.00 | 0.00 | 0.00 | 0.00 |
| 172 | Gemini 3.0 Flashthinking | 0.00 | 90.40 | 68.70 | 0.00 | 0.00 | 0.00 |
| 173 | GPT-5.2thinking | 0.00 | 92.40 | 80.00 | 0.00 | 0.00 | 0.00 |
| 174 | GPT-5.2 Prothinking | 0.00 | 93.20 | 0.00 | 0.00 | 0.00 | 0.00 |
| 175 | Gemini 3.0 Pro (Preview 11-2025)parallel_thinking | 0.00 | 93.80 | 0.00 | 0.00 | 0.00 | 0.00 |
| 176 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 177 | 0.00 | 70.83 | 0.00 | 0.00 | 73.59 | 59.36 | |
| 178 | Qwen3-8Bthinking | 0.00 | 62.00 | 0.00 | 97.40 | 76.00 | 57.50 |
| 179 | 0.00 | 65.00 | 23.60 | 0.00 | 49.60 | 0.00 | |
| 180 | 0.00 | 65.00 | 0.00 | 0.00 | 40.00 | 0.00 | |
| 181 | 0.00 | 65.20 | 0.00 | 94.50 | 0.00 | 0.00 | |
| 182 | Qwen3-4B-Thinking-2507thinking | 0.00 | 65.80 | 0.00 | 0.00 | 0.00 | 55.20 |
| 183 | 0.00 | 66.70 | 27.60 | 0.00 | 0.00 | 34.30 | |
| 184 | 0.00 | 68.00 | 0.00 | 0.00 | 43.40 | 48.50 | |
| 185 | 0.00 | 68.00 | 70.30 | 82.20 | 23.30 | 0.00 | |
| 186 | 0.00 | 68.18 | 0.00 | 0.00 | 70.68 | 55.84 | |
| 187 | Qwen3-32Bthinking | 0.00 | 68.40 | 0.00 | 97.20 | 81.40 | 0.00 |
| 188 | OpenAI o3-minithinking | 0.00 | 70.60 | 40.80 | 95.80 | 60.00 | 0.00 |
| 189 | 0.00 | 62.00 | 0.00 | 0.00 | 0.00 | 35.10 | |
| 190 | Qwen3-235B-A22Bthinking | 0.00 | 71.10 | 0.00 | 98.00 | 85.70 | 70.70 |
| 191 | 0.00 | 73.00 | 0.00 | 0.00 | 0.00 | 67.10 | |
| 192 | 0.00 | 73.70 | 64.80 | 0.00 | 0.00 | 59.00 | |
| 193 | ERNIE-4.5-VL-424B-A47B-Basethinking | 0.00 | 76.80 | 0.00 | 0.00 | 0.00 | 38.80 |
| 194 | 0.00 | 77.80 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 195 | 0.00 | 78.30 | 50.00 | 0.00 | 88.00 | 41.10 | |
| 196 | 0.00 | 79.70 | 49.30 | 97.90 | 87.00 | 69.50 | |
| 197 | 0.00 | 80.40 | 0.00 | 0.00 | 84.20 | 70.60 | |
| 198 | Claude Opus 4.1thinking + 使用工具 | 0.00 | 80.90 | 74.50 | 0.00 | 0.00 | 65.00 |
| 199 | DeepSeek V3.2thinking | 0.00 | 82.40 | 0.00 | 0.00 | 0.00 | 83.30 |