加载中...
加载中...
本页面提供大模型代码编程能力评测排行榜,涵盖 SWE-Bench、LiveCodeBench、HumanEval 等数据集,对 GPT、Claude、Qwen、DeepSeek 等模型进行对比。
评测切换
在这里切换评测,图表和表格会同步更新
数据来源:DataLearnerAI
| 排名 | 模型 | SWE-bench Verified | LiveCodeBench | HumanEval |
|---|---|---|---|---|
| 1 | Claude Sonnet 4.5parallel_thinking + 使用工具 | 82.00 | 0.00 | 0.00 |
| 2 | Claude Opus 4.5thinking | 80.90 | 0.00 | 0.00 |
| 3 | Claude Sonnet 4parallel_thinking + 使用工具 | 80.20 | 0.00 | 0.00 |
| 4 | GPT-5.2thinking | 80.00 | 0.00 | 0.00 |
| 5 | Claude Opus 4.1parallel_thinking + 使用工具 | 79.40 | 0.00 | 0.00 |
| 6 | Claude Sonnet 4.5thinking + 使用工具 | 77.20 | 0.00 | 0.00 |
| 7 | GPT-5.1-Codex-Maxhigh + 使用工具 | 76.80 | 0.00 | 0.00 |
| 8 | GPT-5.1high | 76.30 | 0.00 | 0.00 |
| 9 | Gemini 3.0 Pro (Preview 11-2025)thinking | 76.20 | 92.00 | 0.00 |
| 10 | o3-prohigh | 75.00 | 0.00 | 0.00 |
| 11 | GPT-5 Codexhigh | 74.50 | 0.00 | 0.00 |
| 12 | Claude Opus 4.1thinking + 使用工具 | 74.50 | 65.00 | 0.00 |
| 13 | Claude Opus 4.1thinking | 74.50 | 0.00 | 0.00 |
| 14 | Grok 4 Heavyparallel_thinking + 使用工具 | 73.50 | 0.00 | 0.00 |
| 15 | Haiku 4.5thinking + 使用工具 | 73.30 | 0.00 | 0.00 |
| 16 | DeepSeek V3.2thinking + 使用工具 | 73.10 | 0.00 | 0.00 |
| 17 | GPT-5high | 72.80 | 0.00 | 0.00 |
| 18 | Claude Sonnet 4thinking + 使用工具 | 72.70 | 0.00 | 0.00 |
| 19 | 72.50 | 56.60 | 0.00 | |
| 20 | 72.00 | 0.00 | 0.00 | |
| 21 | Kimi K2 Thinkingthinking + 使用工具 | 71.30 | 0.00 | 0.00 |
| 22 | Grok Code Fast 1thinking | 70.80 | 0.00 | 0.00 |
| 23 | GPT-5.1 Codexhigh + 使用工具 | 70.40 | 85.50 | 0.00 |
| 24 | 70.30 | 0.00 | 0.00 | |
| 25 | 69.60 | 57.50 | 0.00 | |
| 26 | MiniMax M2thinking + 使用工具 | 69.40 | 0.00 | 0.00 |
| 27 | 69.20 | 0.00 | 0.00 | |
| 28 | Kimi K2 0905thinking + 使用工具 | 69.20 | 0.00 | 0.00 |
| 29 | OpenAI o3thinking | 69.10 | 0.00 | 0.00 |
| 30 | Gemini 3.0 Flashthinking | 68.70 | 0.00 | 0.00 |
| 31 | 68.40 | 74.90 | 0.00 | |
| 32 | OpenAI o4 - minithinking | 68.10 | 0.00 | 0.00 |
| 33 | 68.00 | 56.00 | 0.00 | |
| 34 | GLM-4.6thinking + 使用工具 | 68.00 | 84.50 | 0.00 |
| 35 | DeepSeek V3.2-Expthinking + 使用工具 | 67.80 | 0.00 | 0.00 |
| 36 | Gemini 2.5-Prothinking | 67.20 | 0.00 | 0.00 |
| 37 | 67.00 | 0.00 | 0.00 | |
| 38 | 66.00 | 56.40 | 0.00 | |
| 39 | 64.80 | 59.00 | 0.00 | |
| 40 | GLM-4.5thinking | 64.20 | 72.90 | 0.00 |
| 41 | 63.80 | 70.40 | 0.00 | |
| 42 | 63.20 | 77.10 | 0.00 | |
| 43 | 61.60 | 0.00 | 0.00 | |
| 44 | 60.60 | 51.00 | 0.00 | |
| 45 | GPT OSS 120Bthinking | 60.10 | 0.00 | 0.00 |
| 46 | Grok 4thinking | 58.60 | 82.00 | 0.00 |
| 47 | DeepSeek-R1-0528thinking | 57.60 | 73.30 | 0.00 |
| 48 | GLM-4.5-Airthinking | 57.60 | 70.70 | 0.00 |
| 49 | 56.00 | 65.00 | 0.00 | |
| 50 | 55.60 | 62.30 | 0.00 | |
| 51 | 54.60 | 40.50 | 0.00 | |
| 52 | Gemini 2.5 Flash-Preview-09-2025thinking | 54.00 | 0.00 | 0.00 |
| 53 | 53.60 | 0.00 | 0.00 | |
| 54 | 51.80 | 53.70 | 0.00 | |
| 55 | 51.60 | 0.00 | 0.00 | |
| 56 | 50.00 | 41.10 | 0.00 | |
| 57 | 49.30 | 69.50 | 97.60 | |
| 58 | 49.20 | 65.90 | 0.00 | |
| 59 | 49.00 | 38.70 | 93.70 | |
| 60 | Gemini 2.5 Flashthinking | 48.90 | 55.40 | 0.00 |
| 61 | 48.90 | 71.00 | 0.00 | |
| 62 | 46.80 | 0.00 | 0.00 | |
| 63 | OpenAI o3-minithinking | 40.80 | 0.00 | 0.00 |
| 64 | 38.80 | 49.20 | 0.00 | |
| 65 | 38.00 | 46.40 | 0.00 | |
| 66 | 34.40 | 70.70 | 0.00 | |
| 67 | 31.00 | 35.10 | 90.00 | |
| 68 | 27.60 | 34.30 | 0.00 | |
| 69 | 23.60 | 0.00 | 0.00 | |
| 70 | 21.40 | 29.10 | 0.00 | |
| 71 | 0.00 | 67.10 | 0.00 | |
| 72 | Claude Sonnet 4thinking | 0.00 | 66.00 | 0.00 |
| 73 | 0.00 | 65.90 | 0.00 | |
| 74 | 0.00 | 65.70 | 0.00 | |
| 75 | 0.00 | 64.90 | 0.00 | |
| 76 | 0.00 | 65.60 | 0.00 | |
| 77 | 0.00 | 67.10 | 0.00 | |
| 78 | 0.00 | 67.40 | 0.00 | |
| 79 | 0.00 | 70.60 | 0.00 | |
| 80 | Qwen3-235B-A22Bthinking | 0.00 | 70.70 | 0.00 |
| 81 | Claude Sonnet 4.5thinking | 0.00 | 71.00 | 0.00 |
| 82 | 0.00 | 73.80 | 0.00 | |
| 83 | DeepSeek V3.2-Expthinking | 0.00 | 74.10 | 0.00 |
| 84 | Qwen3-235B-A22B-Thinkingthinking | 0.00 | 74.10 | 0.00 |
| 85 | Qwen3-235B-A22B-Thinking-2507thinking | 0.00 | 74.10 | 0.00 |
| 86 | DeepSeek-V3.1thinking | 0.00 | 74.80 | 0.00 |
| 87 | 0.00 | 75.80 | 0.00 | |
| 88 | 0.00 | 51.80 | 0.00 | |
| 89 | 0.00 | 24.60 | 0.00 | |
| 90 | 0.00 | 28.90 | 0.00 | |
| 91 | 0.00 | 29.00 | 0.00 | |
| 92 | 0.00 | 32.80 | 0.00 | |
| 93 | 0.00 | 35.10 | 0.00 | |
| 94 | 0.00 | 35.80 | 0.00 | |
| 95 | 0.00 | 38.80 | 0.00 | |
| 96 | ERNIE-4.5-VL-424B-A47B-Basethinking | 0.00 | 38.80 | 0.00 |
| 97 | 0.00 | 43.20 | 0.00 | |
| 98 | 0.00 | 43.40 | 0.00 | |
| 99 | 0.00 | 48.50 | 0.00 | |
| 100 | 0.00 | 49.40 | 0.00 | |
| 101 | 0.00 | 51.80 | 0.00 | |
| 102 | GLM-4.6thinking | 0.00 | 82.80 | 0.00 |
| 103 | 0.00 | 55.00 | 0.00 | |
| 104 | GPT-5-minithinking | 0.00 | 55.00 | 0.00 |
| 105 | Qwen3-4B-Thinking-2507thinking | 0.00 | 55.20 | 0.00 |
| 106 | 0.00 | 55.84 | 0.00 | |
| 107 | 0.00 | 56.60 | 0.00 | |
| 108 | 0.00 | 57.00 | 0.00 | |
| 109 | Qwen3-8Bthinking | 0.00 | 57.50 | 0.00 |
| 110 | 0.00 | 59.36 | 0.00 | |
| 111 | 0.00 | 59.60 | 0.00 | |
| 112 | 0.00 | 61.80 | 0.00 | |
| 113 | Haiku 4.5thinking | 0.00 | 62.00 | 0.00 |
| 114 | 0.00 | 63.90 | 0.00 | |
| 115 | 0.00 | 0.00 | 73.20 | |
| 116 | 0.00 | 0.00 | 88.40 | |
| 117 | 0.00 | 0.00 | 88.10 | |
| 118 | 0.00 | 29.70 | 87.80 | |
| 119 | 0.00 | 0.00 | 87.20 | |
| 120 | 0.00 | 37.90 | 86.60 | |
| 121 | 0.00 | 0.00 | 84.90 | |
| 122 | 0.00 | 31.50 | 81.10 | |
| 123 | 0.00 | 33.30 | 80.50 | |
| 124 | 0.00 | 0.00 | 74.40 | |
| 125 | 0.00 | 0.00 | 74.10 | |
| 126 | 0.00 | 33.30 | 88.40 | |
| 127 | 0.00 | 0.00 | 66.50 | |
| 128 | 0.00 | 0.00 | 62.20 | |
| 129 | 0.00 | 0.00 | 59.10 | |
| 130 | 0.00 | 0.00 | 57.90 | |
| 131 | 0.00 | 0.00 | 48.10 | |
| 132 | 0.00 | 0.00 | 42.10 | |
| 133 | 0.00 | 0.00 | 37.80 | |
| 134 | 0.00 | 0.00 | 33.50 | |
| 135 | 0.00 | 0.00 | 29.30 | |
| 136 | 0.00 | 0.00 | 28.00 | |
| 137 | Gemini 2.5 Deep Thinkdeeper_thinking | 0.00 | 87.60 | 0.00 |
| 138 | 0.00 | 79.40 | 0.00 | |
| 139 | DeepSeek-V3.1 Terminusthinking | 0.00 | 80.00 | 0.00 |
| 140 | Grok 4 Fastthinking | 0.00 | 80.00 | 0.00 |
| 141 | 0.00 | 80.40 | 0.00 | |
| 142 | Grok 4.1 Fastthinking | 0.00 | 82.00 | 0.00 |
| 143 | 0.00 | 0.00 | 19.00 | |
| 144 | MiniMax M2thinking | 0.00 | 83.00 | 0.00 |
| 145 | Kimi K2 Thinkingthinking | 0.00 | 83.10 | 0.00 |
| 146 | DeepSeek V3.2thinking | 0.00 | 83.30 | 0.00 |
| 147 | Claude Opus 4.5thinking + 使用工具 | 0.00 | 87.00 | 0.00 |
| 148 | 0.00 | 77.10 | 0.00 | |
| 149 | 0.00 | 52.00 | 92.40 | |
| 150 | 0.00 | 0.00 | 92.00 | |
| 151 | 0.00 | 32.00 | 91.00 | |
| 152 | 0.00 | 0.00 | 90.20 | |
| 153 | 0.00 | 0.00 | 89.00 | |
| 154 | 0.00 | 30.20 | 89.00 | |
| 155 | 0.00 | 0.00 | 89.00 | |
| 156 | 0.00 | 34.60 | 89.00 | |
| 157 | 0.00 | 0.00 | 88.41 | |
| 158 | 0.00 | 51.20 | 88.40 |