加载中...
加载中...
本页面提供大模型代码编程能力评测排行榜,涵盖 SWE-Bench、LiveCodeBench、HumanEval 等数据集,对 GPT、Claude、Qwen、DeepSeek 等模型进行对比。
评测切换
在这里切换评测,图表和表格会同步更新
| 排名 | 模型 | SWE-bench Verified | LiveCodeBench | HumanEval |
|---|---|---|---|---|
| 1 | Claude Sonnet 4.5parallel_thinking + 使用工具 | 82.00 | 0.00 | 0.00 |
| 2 | Claude Opus 4.5thinking | 80.90 | 0.00 | 0.00 |
| 3 | Claude Sonnet 4parallel_thinking + 使用工具 | 80.20 | 0.00 | 0.00 |
| 4 | GPT-5.2thinking | 80.00 | 0.00 | 0.00 |
| 5 | Claude Opus 4.1parallel_thinking + 使用工具 | 79.40 | 0.00 | 0.00 |
| 6 | Claude Sonnet 4.5thinking + 使用工具 | 77.20 | 0.00 | 0.00 |
| 7 | GPT-5.1-Codex-Maxhigh + 使用工具 | 76.80 | 0.00 | 0.00 |
| 8 | Kimi K2.5thinking | 76.80 | 85.00 | 0.00 |
| 9 | GPT-5.1high | 76.30 | 0.00 | 0.00 |
| 10 | Gemini 3.0 Pro (Preview 11-2025)thinking | 76.20 | 92.00 | 0.00 |
| 11 | Qwen3-Max-Thinkingthinking | 75.30 | 85.90 | 0.00 |
| 12 | o3-prohigh | 75.00 | 0.00 | 0.00 |
| 13 | GPT-5 Codexhigh | 74.50 | 0.00 | 0.00 |
| 14 | Claude Opus 4.1thinking + 使用工具 | 74.50 | 65.00 | 0.00 |
| 15 | Claude Opus 4.1thinking | 74.50 | 0.00 | 0.00 |
| 16 | M2.1thinking | 74.00 | 0.00 | 0.00 |
| 17 | GLM-4.7thinking + 使用工具 | 73.80 | 0.00 | 0.00 |
| 18 | Grok 4 Heavyparallel_thinking + 使用工具 | 73.50 | 0.00 | 0.00 |
| 19 | Haiku 4.5thinking + 使用工具 | 73.30 | 0.00 | 0.00 |
| 20 | DeepSeek V3.2thinking + 使用工具 | 73.10 | 0.00 | 0.00 |
| 21 | GPT-5high | 72.80 | 0.00 | 0.00 |
| 22 | Claude Sonnet 4thinking + 使用工具 | 72.70 | 0.00 | 0.00 |
| 23 | 72.50 | 56.60 | 0.00 | |
| 24 | 72.00 | 0.00 | 0.00 | |
| 25 | Kimi K2 Thinkingthinking + 使用工具 | 71.30 | 0.00 | 0.00 |
| 26 | Grok Code Fast 1thinking | 70.80 | 0.00 | 0.00 |
| 27 | GPT-5.1 Codexhigh + 使用工具 | 70.40 | 85.50 | 0.00 |
| 28 | 70.30 | 0.00 | 0.00 | |
| 29 | 69.60 | 57.50 | 0.00 | |
| 30 | MiniMax M2thinking + 使用工具 | 69.40 | 0.00 | 0.00 |
| 31 | Kimi K2 0905thinking + 使用工具 | 69.20 | 0.00 | 0.00 |
| 32 | 69.20 | 0.00 | 0.00 | |
| 33 | OpenAI o3thinking | 69.10 | 0.00 | 0.00 |
| 34 | Gemini 3.0 Flashthinking | 68.70 | 0.00 | 0.00 |
| 35 | 68.40 | 74.90 | 0.00 | |
| 36 | OpenAI o4 - minithinking | 68.10 | 0.00 | 0.00 |
| 37 | 68.00 | 56.00 | 0.00 | |
| 38 | GLM-4.6thinking + 使用工具 | 68.00 | 84.50 | 0.00 |
| 39 | DeepSeek V3.2-Expthinking + 使用工具 | 67.80 | 0.00 | 0.00 |
| 40 | Gemini 2.5-Prothinking | 67.20 | 0.00 | 0.00 |
| 41 | 67.00 | 0.00 | 0.00 | |
| 42 | 66.00 | 56.40 | 0.00 | |
| 43 | 64.80 | 59.00 | 0.00 | |
| 44 | GLM-4.5thinking | 64.20 | 72.90 | 0.00 |
| 45 | 63.80 | 70.40 | 0.00 | |
| 46 | 63.20 | 77.10 | 0.00 | |
| 47 | 61.60 | 0.00 | 0.00 | |
| 48 | 60.60 | 51.00 | 0.00 | |
| 49 | GPT OSS 120Bthinking | 60.10 | 0.00 | 0.00 |
| 50 | GLM-4.7-Flashthinking | 59.20 | 0.00 | 0.00 |
| 51 | Grok 4thinking | 58.60 | 82.00 | 0.00 |
| 52 | DeepSeek-R1-0528thinking | 57.60 | 73.30 | 0.00 |
| 53 | GLM-4.5-Airthinking | 57.60 | 70.70 | 0.00 |
| 54 | 56.00 | 65.00 | 0.00 | |
| 55 | 55.60 | 62.30 | 0.00 | |
| 56 | 54.60 | 40.50 | 0.00 | |
| 57 | Gemini 2.5 Flash-Preview-09-2025thinking | 54.00 | 0.00 | 0.00 |
| 58 | 53.60 | 0.00 | 0.00 | |
| 59 | 51.80 | 53.70 | 0.00 | |
| 60 | 51.60 | 0.00 | 0.00 | |
| 61 | 50.00 | 41.10 | 0.00 | |
| 62 | 49.30 | 69.50 | 97.60 | |
| 63 | 49.20 | 65.90 | 0.00 | |
| 64 | 49.00 | 38.70 | 93.70 | |
| 65 | 48.90 | 71.00 | 0.00 | |
| 66 | Gemini 2.5 Flashthinking | 48.90 | 55.40 | 0.00 |
| 67 | 46.80 | 0.00 | 0.00 | |
| 68 | OpenAI o3-minithinking | 40.80 | 0.00 | 0.00 |
| 69 | 38.80 | 49.20 | 0.00 | |
| 70 | 38.00 | 46.40 | 0.00 | |
| 71 | 34.40 | 70.70 | 0.00 | |
| 72 | GPT OSS 20Bthinking | 34.00 | 0.00 | 0.00 |
| 73 | 31.00 | 35.10 | 90.00 | |
| 74 | 27.60 | 34.30 | 0.00 | |
| 75 | 23.60 | 0.00 | 0.00 | |
| 76 | Qwen3-30B-A3B-2507thinking | 22.00 | 0.00 | 0.00 |
| 77 | 21.40 | 29.10 | 0.00 | |
| 78 | DeepSeek V3.2-Expthinking | 0.00 | 74.10 | 0.00 |
| 79 | MiniMax M2thinking | 0.00 | 83.00 | 0.00 |
| 80 | 0.00 | 65.70 | 0.00 | |
| 81 | 0.00 | 65.90 | 0.00 | |
| 82 | Claude Sonnet 4thinking | 0.00 | 66.00 | 0.00 |
| 83 | 0.00 | 67.10 | 0.00 | |
| 84 | 0.00 | 75.80 | 0.00 | |
| 85 | DeepSeek-V3.1thinking | 0.00 | 74.80 | 0.00 |
| 86 | Qwen3-235B-A22B-Thinking-2507thinking | 0.00 | 74.10 | 0.00 |
| 87 | Qwen3-235B-A22B-Thinkingthinking | 0.00 | 74.10 | 0.00 |
| 88 | 0.00 | 67.40 | 0.00 | |
| 89 | 0.00 | 73.80 | 0.00 | |
| 90 | Claude Sonnet 4.5thinking | 0.00 | 71.00 | 0.00 |
| 91 | Qwen3-235B-A22Bthinking | 0.00 | 70.70 | 0.00 |
| 92 | 0.00 | 70.60 | 0.00 | |
| 93 | 0.00 | 77.10 | 0.00 | |
| 94 | 0.00 | 67.10 | 0.00 | |
| 95 | 0.00 | 51.80 | 0.00 | |
| 96 | 0.00 | 24.60 | 0.00 | |
| 97 | 0.00 | 28.90 | 0.00 | |
| 98 | 0.00 | 29.00 | 0.00 | |
| 99 | 0.00 | 32.80 | 0.00 | |
| 100 | 0.00 | 35.10 | 0.00 | |
| 101 | 0.00 | 35.80 | 0.00 | |
| 102 | 0.00 | 38.80 | 0.00 | |
| 103 | ERNIE-4.5-VL-424B-A47B-Basethinking | 0.00 | 38.80 | 0.00 |
| 104 | 0.00 | 43.20 | 0.00 | |
| 105 | 0.00 | 43.40 | 0.00 | |
| 106 | 0.00 | 48.50 | 0.00 | |
| 107 | 0.00 | 49.40 | 0.00 | |
| 108 | 0.00 | 51.80 | 0.00 | |
| 109 | 0.00 | 65.60 | 0.00 | |
| 110 | 0.00 | 55.00 | 0.00 | |
| 111 | GPT-5-minithinking | 0.00 | 55.00 | 0.00 |
| 112 | Qwen3-4B-Thinking-2507thinking | 0.00 | 55.20 | 0.00 |
| 113 | 0.00 | 55.84 | 0.00 | |
| 114 | 0.00 | 56.60 | 0.00 | |
| 115 | 0.00 | 57.00 | 0.00 | |
| 116 | Qwen3-8Bthinking | 0.00 | 57.50 | 0.00 |
| 117 | 0.00 | 59.36 | 0.00 | |
| 118 | 0.00 | 59.60 | 0.00 | |
| 119 | 0.00 | 61.80 | 0.00 | |
| 120 | Haiku 4.5thinking | 0.00 | 62.00 | 0.00 |
| 121 | 0.00 | 63.90 | 0.00 | |
| 122 | 0.00 | 64.90 | 0.00 | |
| 123 | 0.00 | 0.00 | 73.20 | |
| 124 | 0.00 | 0.00 | 88.40 | |
| 125 | 0.00 | 0.00 | 88.10 | |
| 126 | 0.00 | 29.70 | 87.80 | |
| 127 | 0.00 | 0.00 | 87.20 | |
| 128 | 0.00 | 37.90 | 86.60 | |
| 129 | 0.00 | 0.00 | 84.90 | |
| 130 | 0.00 | 31.50 | 81.10 | |
| 131 | 0.00 | 33.30 | 80.50 | |
| 132 | 0.00 | 0.00 | 74.40 | |
| 133 | 0.00 | 0.00 | 74.10 | |
| 134 | 0.00 | 33.30 | 88.40 | |
| 135 | 0.00 | 0.00 | 66.50 | |
| 136 | 0.00 | 0.00 | 62.20 | |
| 137 | 0.00 | 0.00 | 59.10 | |
| 138 | 0.00 | 0.00 | 57.90 | |
| 139 | 0.00 | 0.00 | 48.10 | |
| 140 | 0.00 | 0.00 | 42.10 | |
| 141 | 0.00 | 0.00 | 37.80 | |
| 142 | 0.00 | 0.00 | 33.50 | |
| 143 | 0.00 | 0.00 | 29.30 | |
| 144 | 0.00 | 0.00 | 28.00 | |
| 145 | Gemini 2.5 Deep Thinkdeeper_thinking | 0.00 | 87.60 | 0.00 |
| 146 | DeepSeek-V3.1 Terminusthinking | 0.00 | 80.00 | 0.00 |
| 147 | Grok 4 Fastthinking | 0.00 | 80.00 | 0.00 |
| 148 | 0.00 | 80.40 | 0.00 | |
| 149 | Grok 4.1 Fastthinking | 0.00 | 82.00 | 0.00 |
| 150 | GLM-4.6thinking | 0.00 | 82.80 | 0.00 |
| 151 | 0.00 | 0.00 | 19.00 | |
| 152 | Kimi K2 Thinkingthinking | 0.00 | 83.10 | 0.00 |
| 153 | DeepSeek V3.2thinking | 0.00 | 83.30 | 0.00 |
| 154 | GLM-4.7thinking | 0.00 | 84.90 | 0.00 |
| 155 | Claude Opus 4.5thinking + 使用工具 | 0.00 | 87.00 | 0.00 |
| 156 | 0.00 | 79.40 | 0.00 | |
| 157 | 0.00 | 52.00 | 92.40 | |
| 158 | 0.00 | 0.00 | 92.00 | |
| 159 | 0.00 | 32.00 | 91.00 | |
| 160 | 0.00 | 0.00 | 90.20 | |
| 161 | 0.00 | 0.00 | 89.00 | |
| 162 | 0.00 | 30.20 | 89.00 | |
| 163 | 0.00 | 0.00 | 89.00 | |
| 164 | 0.00 | 34.60 | 89.00 | |
| 165 | 0.00 | 0.00 | 88.41 | |
| 166 | 0.00 | 51.20 | 88.40 |