加载中...
加载中...
本页面提供大模型代码编程能力评测排行榜,涵盖 SWE-Bench、LiveCodeBench、HumanEval 等数据集,对 GPT、Claude、Qwen、DeepSeek 等模型进行对比。
| 72.40 |
| 80.70 |
| 0.00 |
| 3 | GLM-4.7-Flash | 59.20 | 0.00 | 0.00 |
| 4 | Devstral Small 1.1 | 53.60 | 0.00 | 0.00 |
| 5 | Qwen3-Coder-Flash | 51.60 | 0.00 | 0.00 |
| 6 | Devstral Small 1.0 | 46.80 | 0.00 | 0.00 |
| 7 | GPT OSS 20B | 34.00 | 0.00 | 0.00 |
| 8 | Qwen3-30B-A3B-2507 | 22.00 | 43.20 | 0.00 |
| 9 | Qwen3-30B-A3B | 0.00 | 29.00 | 0.00 |
| 10 | Mistral-Small-3.1-24B-Instruct-2503 | 0.00 | 0.00 | 88.41 |
| 11 | Magistral-Small-2506 | 0.00 | 55.84 | 0.00 |
| 12 | Qwen3-32B | 0.00 | 65.70 | 0.00 |
| 13 | Qwen3-235B-A22B-Thinking | 0.00 | 74.10 | 0.00 |
| 14 | Gemma 4 31B | 0.00 | 80.00 | 0.00 |
| 15 | QwQ-32B | 0.00 | 0.00 | 19.00 |
| 16 | Gemma2-27B | 0.00 | 0.00 | 51.80 |
| 17 | C4AI Aya Vision 32B | 0.00 | 0.00 | 62.20 |
| 18 | Codestral | 0.00 | 31.50 | 81.10 |
| 19 | Gemma 3 - 27B (IT) | 0.00 | 29.70 | 87.80 |
| 20 | Qwen2.5-32B | 0.00 | 51.20 | 88.40 |