加载中...
加载中...
本页面提供大模型代码编程能力评测排行榜,涵盖 SWE-Bench、LiveCodeBench、HumanEval 等数据集,对 GPT、Claude、Qwen、DeepSeek 等模型进行对比。
| 82.00 |
| 0.00 |
| 0.00 |
| 3 | Claude Opus 4.5 | 80.90 | 0.00 | 0.00 |
| 4 | Claude Opus 4.6 | 80.84 | 0.00 | 0.00 |
| 5 | Gemini 3.1 Pro Preview | 80.60 | 2887.00 | 0.00 |
| 6 | Claude Sonnet 4 | 80.20 | 0.00 | 0.00 |
| 7 | MiniMax M2.5 | 80.20 | 0.00 | 0.00 |
| 8 | GPT-5.2 | 80.00 | 0.00 | 0.00 |
| 9 | Claude Sonnet 4.6 | 79.60 | 0.00 | 0.00 |
| 10 | Claude Opus 4.1 | 79.40 | 0.00 | 0.00 |
| 11 | Qwen 3.6 Plus Preview | 78.80 | 0.00 | 0.00 |
| 12 | GLM-5 | 77.80 | 0.00 | 0.00 |
| 13 | Claude Sonnet 4.5 | 77.20 | 0.00 | 0.00 |
| 14 | GPT-5.1-Codex-Max | 76.80 | 0.00 | 0.00 |
| 15 | Kimi K2.5 | 76.80 | 0.00 | 0.00 |
| 16 | Qwen3.5-397B-A17B | 76.40 | 0.00 | 0.00 |
| 17 | GPT-5.1 | 76.30 | 0.00 | 0.00 |
| 18 | GPT-5.1 | 76.30 | 0.00 | 0.00 |
| 19 | Gemini 3.0 Pro (Preview 11-2025) | 76.20 | 92.00 | 0.00 |
| 20 | Qwen3-Max-Thinking | 75.30 | 85.90 | 0.00 |
| 21 | o3-pro | 75.00 | 0.00 | 0.00 |
| 22 | M2.1 | 74.80 | 0.00 | 0.00 |
| 23 | Claude Opus 4.1 | 74.50 | 0.00 | 0.00 |
| 24 | Claude Opus 4.1 | 74.50 | 65.00 | 0.00 |
| 25 | GPT-5 Codex | 74.50 | 0.00 | 0.00 |
| 26 | Step 3.5 Flash | 74.40 | 86.40 | 0.00 |
| 27 | GLM-4.7 | 73.80 | 0.00 | 0.00 |
| 28 | Grok 4 Heavy | 73.50 | 0.00 | 0.00 |
| 29 | Haiku 4.5 | 73.30 | 0.00 | 0.00 |
| 30 | DeepSeek V3.2 | 73.10 | 0.00 | 0.00 |