LMArena Coding Arena 代码能力排行榜
基于 LMArena Coding Arena 用户匿名投票的最新AI大模型代码编程能力排行榜,涵盖各模型的 Elo 得分、95% 置信区间、投票量、机构与许可证。
榜首模型
Opus 4.7 (thinking)
最高得分
1572.00
模型数量
200
数据版本
2026年04月24日
数据来源: LM Arena
关于本排行榜
本排行榜展示了当前 AI 大模型在代码编程任务中的实力排名。数据来源于 LMArena (前身为 LMSYS Chatbot Arena)的 Coding 子赛道,通过真实用户匿名盲测投票评估各模型在代码编程任务中的表现。
评测方法概要
匿名盲测:用户发出编程问题后,由两个"隐藏身份"的模型分别给出代码解答,用户投票选出更好的回答,排除品牌偏见。
Elo 评分:采用 Bradley-Terry 模型计算 Elo 分数,分数越高说明该模型的代码回答越容易被用户选择。
覆盖多种编程场景:包括代码生成、Bug 修复、算法实现、代码解释等高频真实编程场景。
DataLearner 在原始数据基础上提供中文解读与深度分析,并将排行榜模型关联至 DataLearner 模型库,方便您一键查看模型详情、API 定价、评测得分等完整信息。
筛选条件
榜单历史快照月份:
排名总表
| 排名 | 模型名称 | 得分 | 95% CI | 投票数 | 机构 | 许可证 |
|---|---|---|---|---|---|---|
| 1 | Opus 4.7 (thinking) | 1572.00 | +/-17 | 1,266 | Anthropic | Proprietary |
| 2 | Opus 4.7 | 1560.00 | +/-15 | 1,577 | Anthropic | Proprietary |
| 3 | Claude Opus 4.6 (thinking) | 1554.00 | +/-9 | 4,483 | Anthropic | Proprietary |
| 4 | Claude Opus 4.6 | 1549.00 | +/-9 | 5,165 | Anthropic | Proprietary |
| 5 | Muse Spark | 1533.00 | +/-14 | 1,754 | Facebook AI研究实验室 | Proprietary |
| 6 | gpt-5.4-high | 1532.00 | +/-11 | 3,110 | OpenAI | Proprietary |
| 7 | Gemini 3.1 Pro Preview | 1531.00 | +/-8 | 5,932 | Google Deep Mind | Proprietary |
| 8 | Claude Opus 4 (thinking-32k) | 1531.00 | +/-7 | 7,634 | Anthropic | Proprietary |
| 9 | grok-4.20-beta-0309-reasoning | 1520.00 | +/-10 | 3,289 | xAI | Proprietary |
| 10 | GLM 5.1 | 1520.00 | +/-12 | 2,233 | 智谱AI | MIT |
| 11 | Claude Sonnet 4.6 | 1520.00 | +/-11 | 3,109 | Anthropic | Proprietary |
| 12 | gpt-5.2-chat-latest-20260210 | 1520.00 | +/-9 | 4,632 | OpenAI | Proprietary |
| 13 | grok-4.20-multi-agent-beta-0309 | 1519.00 | +/-10 | 3,455 | xAI | Proprietary |
| 14 | Claude Sonnet 4.5 (thinking-32k) | 1519.00 | +/-6 | 13,691 | Anthropic | Proprietary |
| 15 | Gemini 3.0 Pro (Preview 11-2025) | 1519.00 | +/-7 | 8,584 | Google Deep Mind | Proprietary |
| 16 | Claude Opus 4 | 1519.00 | +/-7 | 11,221 | Anthropic | Proprietary |
| 17 | GPT-5.4 | 1516.00 | +/-10 | 3,320 | OpenAI | Proprietary |
| 18 | kimi-k2.6 | 1515.00 | +/-18 | 1,025 | Moonshot | Modified MIT |
| 19 | dola-seed-2.0-pro | 1514.00 | +/-8 | 5,652 | Bytedance | Proprietary |
| 20 | grok-4.20-beta1 | 1513.00 | +/-10 | 3,275 | xAI | Proprietary |
| 21 | Opus 4.1 (thinking-16k) | 1512.00 | +/-7 | 9,850 | Anthropic | Proprietary |
| 22 | Claude Sonnet 4.5 | 1510.00 | +/-6 | 13,268 | Anthropic | Proprietary |
| 23 | Gemini 3.0 Flash | 1509.00 | +/-8 | 6,397 | Google Deep Mind | Proprietary |
| 24 | Kimi K2 Thinking | 1508.00 | +/-8 | 5,630 | Moonshot AI | Modified MIT |
| 25 | qwen3.5-max-preview | 1506.00 | +/-11 | 2,827 | Alibaba | Proprietary |
| 26 | gpt-5.4-mini-high | 1505.00 | +/-12 | 2,614 | OpenAI | Proprietary |
| 27 | Opus 4.1 | 1505.00 | +/-5 | 15,543 | Anthropic | Proprietary |
| 28 | kimi-k2.5-instant | 1504.00 | +/-14 | 1,805 | Moonshot | Modified MIT |
| 29 | mimo-v2-pro | 1503.00 | +/-11 | 2,923 | Xiaomi | Proprietary |
| 30 | qwen3.6-plus | 1502.00 | +/-15 | 1,392 | Alibaba | Proprietary |
| 31 | Grok 4.1 Thinking | 1501.00 | +/-6 | 10,675 | xAI | Proprietary |
| 32 | gpt-5.3-chat-latest | 1501.00 | +/-9 | 4,282 | OpenAI | Proprietary |
| 33 | Gemini 3.0 Flash (thinking-minimal) | 1499.00 | +/-7 | 8,274 | Google Deep Mind | Proprietary |
| 34 | deepseek-v4-pro-thinking | 1499.00 | +/-19 | 927 | DeepSeek | MIT |
| 35 | gemma-4-31b | 1498.00 | +/-16 | 1,345 | Apache 2.0 | |
| 36 | Claude Opus 4 (thinking-16k) | 1498.00 | +/-8 | 6,677 | Anthropic | Proprietary |
| 37 | longcat-flash-chat-2602-exp | 1496.00 | +/-12 | 2,444 | Meituan | Proprietary |
| 38 | GPT-5.2 Pro (high) | 1496.00 | +/-7 | 7,636 | OpenAI | Proprietary |
| 39 | GPT-5.2 | 1494.00 | +/-7 | 6,953 | OpenAI | Proprietary |
| 40 | GLM-5 | 1493.00 | +/-9 | 4,075 | 智谱AI | MIT |
| 41 | ERNIE 5.0 | 1493.00 | +/-8 | 5,797 | 百度 | Proprietary |
| 42 | Grok 4.1 | 1491.00 | +/-6 | 11,728 | xAI | Proprietary |
| 43 | GPT-5.1 Pro (high) | 1490.00 | +/-7 | 8,223 | OpenAI | Proprietary |
| 44 | amazon-nova-experimental-chat-26-02-10 | 1488.00 | +/-20 | 842 | Amazon | Proprietary |
| 45 | Qwen3.5-397B-A17B | 1487.00 | +/-9 | 4,640 | 阿里巴巴 | Apache 2.0 |
| 46 | kimi-k2-thinking-turbo | 1487.00 | +/-6 | 10,891 | Moonshot | Modified MIT |
| 47 | GLM-4.7 | 1486.00 | +/-12 | 2,414 | 智谱AI | MIT |
| 48 | gemma-4-26b-a4b | 1481.00 | +/-15 | 1,352 | Apache 2.0 | |
| 49 | Qwen3 Max (Preview) | 1481.00 | +/-8 | 5,367 | 阿里巴巴 | Proprietary |
| 50 | deepseek-v4-pro | 1480.00 | +/-17 | 1,119 | DeepSeek | MIT |
| 51 | amazon-nova-experimental-chat-26-01-10 | 1480.00 | +/-21 | 739 | Amazon | Proprietary |
| 52 | deepseek-v4-flash-thinking | 1479.00 | +/-19 | 843 | DeepSeek | MIT |
| 53 | claude-haiku-4-5-20251001 | 1476.00 | +/-6 | 13,881 | Anthropic | Proprietary |
| 54 | deepseek-v4-flash | 1476.00 | +/-20 | 867 | DeepSeek | MIT |
| 55 | qwen3-max-2025-09-23 | 1475.00 | +/-13 | 2,045 | Alibaba | Proprietary |
| 56 | longcat-flash-chat | 1474.00 | +/-13 | 2,238 | Meituan | MIT |
| 57 | DeepSeek V3.2 (thinking) | 1474.00 | +/-7 | 7,775 | DeepSeek-AI | MIT |
| 58 | GPT-5.1 Instant | 1474.00 | +/-7 | 9,131 | OpenAI | Proprietary |
| 59 | DeepSeek V3.2-Exp (thinking) | 1474.00 | +/-13 | 1,917 | DeepSeek-AI | MIT |
| 60 | Claude Sonnet 4 (thinking-32k) | 1472.00 | +/-8 | 6,418 | Anthropic | Proprietary |
| 61 | Qwen3-235B-A22B-2507 | 1471.00 | +/-5 | 17,554 | 阿里巴巴 | Apache 2.0 |
| 62 | ERNIE 5.0 | 1471.00 | +/-13 | 1,966 | 百度 | Proprietary |
| 63 | chatgpt-4o-latest-20250326 | 1469.00 | +/-5 | 15,883 | OpenAI | Proprietary |
| 64 | Mistral Large 3 | 1468.00 | +/-7 | 9,091 | MistralAI | Apache 2.0 |
| 65 | DeepSeek V3.2 | 1468.00 | +/-7 | 9,609 | DeepSeek-AI | MIT |
| 66 | kimi-k2-0905-preview | 1467.00 | +/-13 | 2,246 | Moonshot | Modified MIT |
| 67 | GPT-5-Pro (high) | 1466.00 | +/-8 | 6,364 | OpenAI | Proprietary |
| 68 | MiniMax-M2.7 | 1466.00 | +/-11 | 2,714 | MiniMaxAI | Modified MIT |
| 69 | Gemini 2.5 Pro Experimental 03-25 | 1466.00 | +/-5 | 22,357 | Google Deep Mind | Proprietary |
| 70 | Qwen3-VL-235B-A22B-Instruct | 1465.00 | +/-13 | 2,319 | 阿里巴巴 | Apache 2.0 |
| 71 | DeepSeek V3.2-Exp | 1465.00 | +/-12 | 2,499 | DeepSeek-AI | MIT |
| 72 | grok-4-1-fast-reasoning | 1464.00 | +/-6 | 10,141 | xAI | Proprietary |
| 73 | DeepSeek-R1-0528 | 1464.00 | +/-11 | 2,729 | DeepSeek-AI | MIT |
| 74 | Claude Opus 4 | 1463.00 | +/-7 | 7,908 | Anthropic | Proprietary |
| 75 | GPT-5 | 1463.00 | +/-8 | 5,988 | OpenAI | Proprietary |
| 76 | deepseek-v3.1-terminus-thinking | 1461.00 | +/-24 | 637 | DeepSeek | MIT |
| 77 | gemini-3.1-flash-lite-preview | 1461.00 | +/-9 | 4,702 | Proprietary | |
| 78 | gpt-5.4-nano-high | 1460.00 | +/-12 | 2,497 | OpenAI | Proprietary |
| 79 | GLM-4.6 | 1460.00 | +/-7 | 7,496 | 智谱AI | MIT |
| 80 | Kimi K2 | 1459.00 | +/-8 | 5,249 | Moonshot AI | Modified MIT |
| 81 | GPT-4.5 | 1458.00 | +/-13 | 1,939 | OpenAI | Proprietary |
| 82 | Grok 4 Fast | 1458.00 | +/-16 | 1,248 | xAI | Proprietary |
| 83 | OpenAI o3 | 1458.00 | +/-6 | 11,757 | OpenAI | Proprietary |
| 84 | qwen3-coder-480b-a35b-instruct | 1456.00 | +/-9 | 4,858 | Alibaba | Apache 2.0 |
| 85 | DeepSeek-V3.1 (thinking) | 1456.00 | +/-13 | 1,906 | DeepSeek-AI | MIT |
| 86 | gpt-4.1-2025-04-14 | 1456.00 | +/-7 | 9,323 | OpenAI | Proprietary |
| 87 | MiniMax M2.5 | 1456.00 | +/-9 | 4,829 | MiniMaxAI | Modified MIT |
| 88 | qwen3-vl-235b-a22b-thinking | 1455.00 | +/-14 | 1,631 | Alibaba | Apache 2.0 |
| 89 | GLM-4.5 | 1454.00 | +/-9 | 4,771 | 智谱AI | MIT |
| 90 | Magistral-Medium-2506 | 1453.00 | +/-5 | 17,051 | MistralAI | Proprietary |
| 91 | qwen3.5-122b-a10b | 1452.00 | +/-10 | 3,778 | Alibaba | Apache 2.0 |
| 92 | Claude Sonnet 3.7 (thinking-32k) | 1450.00 | +/-8 | 6,196 | Anthropic | Proprietary |
| 93 | Step 3.5 Flash | 1449.00 | +/-8 | 4,924 | StepFunAI | Apache 2.0 |
| 94 | mimo-v2-flash (non-thinking) | 1449.00 | +/-7 | 7,751 | Xiaomi | MIT |
| 95 | Claude Sonnet 4 | 1448.00 | +/-7 | 7,398 | Anthropic | Proprietary |
| 96 | DeepSeek-V3.1 | 1446.00 | +/-12 | 2,627 | DeepSeek-AI | MIT |
| 97 | qwen3-235b-a22b-no-thinking | 1445.00 | +/-8 | 6,981 | Alibaba | Apache 2.0 |
| 98 | qwen3-next-80b-a3b-instruct | 1445.00 | +/-9 | 4,801 | Alibaba | Apache 2.0 |
| 99 | qwen3.5-27b | 1444.00 | +/-10 | 3,774 | Alibaba | Apache 2.0 |
| 100 | DeepSeek-R1 | 1444.00 | +/-12 | 2,317 | DeepSeek-AI | MIT |
| 101 | Grok 3 | 1443.00 | +/-8 | 5,401 | xAI | Proprietary |
| 102 | qwen3-235b-a22b-thinking-2507 | 1441.00 | +/-15 | 1,612 | Alibaba | Apache 2.0 |
| 103 | trinity-large-preview | 1440.00 | +/-10 | 3,721 | Arcee AI | Apache 2.0 |
| 104 | minimax-m2.1-preview | 1440.00 | +/-10 | 3,431 | MiniMax | MIT |
| 105 | qwen3-30b-a3b-instruct-2507 | 1439.00 | +/-9 | 4,668 | Alibaba | Apache 2.0 |
| 106 | DeepSeek-V3.1 Terminus | 1439.00 | +/-21 | 782 | DeepSeek-AI | MIT |
| 107 | hunyuan-vision-1.5-thinking | 1438.00 | +/-27 | 437 | Tencent | Proprietary |
| 108 | qwen3.5-35b-a3b | 1437.00 | +/-9 | 3,878 | Alibaba | Apache 2.0 |
| 109 | grok-4-fast-reasoning | 1437.00 | +/-9 | 3,958 | xAI | Proprietary |
| 110 | amazon-nova-experimental-chat-12-10 | 1436.00 | +/-21 | 704 | Amazon | Proprietary |
| 111 | grok-4-0709 | 1435.00 | +/-7 | 8,160 | xAI | Proprietary |
| 112 | o3-mini-high | 1434.00 | +/-12 | 2,596 | OpenAI | Proprietary |
| 113 | claude-3-5-sonnet-20241022 | 1433.00 | +/-6 | 14,970 | Anthropic | Proprietary |
| 114 | qwen3-235b-a22b | 1433.00 | +/-9 | 4,340 | Alibaba | Apache 2.0 |
| 115 | ERNIE 5.0 | 1433.00 | +/-19 | 918 | 百度 | Proprietary |
| 116 | mistral-medium-2505 | 1433.00 | +/-8 | 5,901 | Mistral | Proprietary |
| 117 | mimo-v2-flash (thinking) | 1432.00 | +/-12 | 2,442 | Xiaomi | MIT |
| 118 | gpt-4.1-mini-2025-04-14 | 1432.00 | +/-7 | 6,925 | OpenAI | Proprietary |
| 119 | o1-2024-12-17 | 1432.00 | +/-10 | 3,973 | OpenAI | Proprietary |
| 120 | qwen3.5-flash | 1431.00 | +/-9 | 4,132 | Alibaba | Proprietary |
| 121 | o4-mini-2025-04-16 | 1431.00 | +/-7 | 8,722 | OpenAI | Proprietary |
| 122 | mai-1-preview | 1430.00 | +/-11 | 2,780 | Microsoft AI | Proprietary |
| 123 | gpt-5-mini-high | 1429.00 | +/-9 | 5,506 | OpenAI | Proprietary |
| 124 | Claude Sonnet 3.7 | 1429.00 | +/-7 | 7,149 | Anthropic | Proprietary |
| 125 | gemini-2.5-flash-preview-09-2025 | 1428.00 | +/-8 | 6,850 | Proprietary | |
| 126 | DeepSeek-V3-0324 | 1428.00 | +/-7 | 8,377 | DeepSeek-AI | MIT |
| 127 | glm-4.5-air | 1426.00 | +/-8 | 6,116 | Z.ai | MIT |
| 128 | glm-4.7-flash | 1424.00 | +/-11 | 2,693 | Z.ai | MIT |
| 129 | Gemini 2.5 Flash | 1424.00 | +/-5 | 21,705 | Google Deep Mind | Proprietary |
| 130 | qwen3-next-80b-a3b-thinking | 1421.00 | +/-11 | 2,680 | Alibaba | Apache 2.0 |
| 131 | amazon-nova-experimental-chat-11-10 | 1420.00 | +/-8 | 5,323 | Amazon | Proprietary |
| 132 | GLM-4.6V | 1420.00 | +/-25 | 535 | 智谱AI | MIT |
| 133 | o1-preview | 1416.00 | +/-9 | 5,123 | OpenAI | Proprietary |
| 134 | minimax-m1 | 1415.00 | +/-8 | 6,496 | MiniMax | Apache 2.0 |
| 135 | o3-mini | 1415.00 | +/-6 | 9,462 | OpenAI | Proprietary |
| 136 | mistral-small-2506 | 1412.00 | +/-10 | 3,362 | Mistral | Apache 2.0 |
| 137 | ling-flash-2.0 | 1412.00 | +/-15 | 1,528 | Ant Group | MIT |
| 138 | amazon-nova-experimental-chat-10-20 | 1411.00 | +/-12 | 2,294 | Amazon | Proprietary |
| 139 | intellect-3 | 1410.00 | +/-19 | 971 | Prime Intellect | MIT |
| 140 | nvidia-nemotron-3-super-120b-a12b | 1408.00 | +/-14 | 1,716 | Nvidia | NVIDIA Open Model |
| 141 | qwen3-32b | 1407.00 | +/-24 | 513 | Alibaba | Apache 2.0 |
| 142 | step-3 | 1407.00 | +/-17 | 1,235 | StepFun | Apache 2.0 |
| 143 | nvidia-llama-3.3-nemotron-super-49b-v1.5 | 1405.00 | +/-22 | 659 | Nvidia | Nvidia Open |
| 144 | glm-4.5v | 1404.00 | +/-18 | 993 | Z.ai | MIT |
| 145 | qwen2.5-max | 1402.00 | +/-8 | 5,102 | Alibaba | Proprietary |
| 146 | hunyuan-t1-20250711 | 1400.00 | +/-20 | 806 | Tencent | Proprietary |
| 147 | hunyuan-turbos-20250226 | 1399.00 | +/-31 | 275 | Tencent | Proprietary |
| 148 | mercury-2 | 1398.00 | +/-21 | 767 | Inception AI | Proprietary |
| 149 | gemini-2.5-flash-lite-preview-09-2025-no-thinking | 1397.00 | +/-7 | 9,697 | Proprietary | |
| 150 | nova-2-lite | 1397.00 | +/-12 | 2,518 | Amazon | Proprietary |
| 151 | claude-3-5-sonnet-20240620 | 1396.00 | +/-7 | 13,607 | Anthropic | Proprietary |
| 152 | hunyuan-turbos-20250416 | 1394.00 | +/-14 | 1,776 | Tencent | Proprietary |
| 153 | llama-3.1-nemotron-ultra-253b-v1 | 1391.00 | +/-30 | 367 | Nvidia | Nvidia Open Model |
| 154 | GPT OSS 120B | 1390.00 | +/-8 | 6,497 | OpenAI | Apache 2.0 |
| 155 | ring-flash-2.0 | 1390.00 | +/-15 | 1,540 | Ant Group | MIT |
| 156 | grok-3-mini-high | 1390.00 | +/-10 | 3,301 | xAI | Proprietary |
| 157 | command-a-03-2025 | 1389.00 | +/-6 | 10,221 | Cohere | CC-BY-NC-4.0 |
| 158 | amazon-nova-experimental-chat-10-09 | 1388.00 | +/-24 | 553 | Amazon | Proprietary |
| 159 | o1-mini | 1387.00 | +/-7 | 8,478 | OpenAI | Proprietary |
| 160 | deepseek-v3 | 1387.00 | +/-10 | 3,280 | DeepSeek | DeepSeek |
| 161 | qwen3-30b-a3b | 1386.00 | +/-9 | 4,534 | Alibaba | Apache 2.0 |
| 162 | grok-3-mini-beta | 1386.00 | +/-9 | 4,256 | xAI | Proprietary |
| 163 | magistral-medium-2506 | 1385.00 | +/-12 | 2,250 | Mistral | Proprietary |
| 164 | olmo-3.1-32b-instruct | 1385.00 | +/-12 | 2,521 | Ai2 | Apache 2.0 |
| 165 | qwq-32b | 1384.00 | +/-9 | 4,048 | Alibaba | Apache 2.0 |
| 166 | gemini-2.5-flash-lite-preview-06-17-thinking | 1384.00 | +/-8 | 6,013 | Proprietary | |
| 167 | claude-3-5-haiku-20241022 | 1383.00 | +/-6 | 11,251 | Anthropic | Proprietary |
| 168 | minimax-m2 | 1383.00 | +/-15 | 1,545 | MiniMax | Apache 2.0 |
| 169 | gpt-5-nano-high | 1381.00 | +/-15 | 1,688 | OpenAI | Proprietary |
| 170 | qwen-plus-0125 | 1379.00 | +/-18 | 893 | Alibaba | Proprietary |
| 171 | llama-3.1-405b-instruct-bf16 | 1374.00 | +/-7 | 6,249 | Meta | Llama 3.1 Community |
| 172 | deepseek-v2.5-1210 | 1374.00 | +/-17 | 1,079 | DeepSeek | DeepSeek |
| 173 | gpt-4.1-nano-2025-04-14 | 1373.00 | +/-19 | 807 | OpenAI | Proprietary |
| 174 | llama-4-maverick-17b-128e-instruct | 1372.00 | +/-7 | 6,996 | Meta | Llama 4 |
| 175 | hunyuan-turbo-0110 | 1371.00 | +/-30 | 299 | Tencent | Proprietary |
| 176 | step-2-16k-exp-202412 | 1371.00 | +/-20 | 737 | StepFun | Proprietary |
| 177 | athene-v2-chat | 1369.00 | +/-9 | 4,019 | NexusFlow | NexusFlow |
| 178 | GPT OSS 20B | 1369.00 | +/-13 | 2,168 | OpenAI | Apache 2.0 |
| 179 | yi-lightning | 1368.00 | +/-10 | 4,316 | 01 AI | Proprietary |
| 180 | gpt-4o-2024-05-13 | 1368.00 | +/-6 | 19,526 | OpenAI | Proprietary |
| 181 | deepseek-v2.5 | 1368.00 | +/-9 | 4,252 | DeepSeek | DeepSeek |
| 182 | mercury | 1367.00 | +/-29 | 395 | Inception AI | Proprietary |
| 183 | llama-3.1-405b-instruct-fp8 | 1367.00 | +/-7 | 9,714 | Meta | Llama 3.1 Community |
| 184 | hunyuan-large-2025-02-10 | 1366.00 | +/-25 | 519 | Tencent | Proprietary |
| 185 | gemini-2.0-flash-001 | 1365.00 | +/-7 | 6,998 | Proprietary | |
| 186 | olmo-3-32b-think | 1364.00 | +/-18 | 1,054 | Ai2 | Apache 2.0 |
| 187 | nvidia-nemotron-3-nano-30b-a3b-bf16 | 1364.00 | +/-11 | 3,284 | Nvidia | NVIDIA Open Model |
| 188 | llama-3.3-nemotron-49b-super-v1 | 1362.00 | +/-31 | 286 | Nvidia | Nvidia |
| 189 | llama-4-scout-17b-16e-instruct | 1361.00 | +/-9 | 5,258 | Meta | Llama |
| 190 | mistral-small-3.1-24b-instruct-2503 | 1361.00 | +/-8 | 6,141 | Mistral | Apache 2.0 |
| 191 | gpt-4o-2024-08-06 | 1360.00 | +/-8 | 7,318 | OpenAI | Proprietary |
| 192 | gemma-3-27b-it | 1358.00 | +/-7 | 8,080 | Gemma | |
| 193 | grok-2-2024-08-13 | 1358.00 | +/-7 | 10,368 | xAI | Proprietary |
| 194 | qwen2.5-plus-1127 | 1356.00 | +/-14 | 1,553 | Alibaba | Proprietary |
| 195 | gemini-1.5-pro-002 | 1356.00 | +/-7 | 9,175 | Proprietary | |
| 196 | hunyuan-large-vision | 1355.00 | +/-19 | 963 | Tencent | Proprietary |
| 197 | qwen2.5-72b-instruct | 1355.00 | +/-8 | 6,688 | Alibaba | Qwen |
| 198 | step-1o-turbo-202506 | 1353.00 | +/-15 | 1,505 | StepFun | Proprietary |
| 199 | mistral-large-2407 | 1353.00 | +/-8 | 7,589 | Mistral | Mistral Research |
| 200 | Claude3-Opus | 1352.00 | +/-6 | 33,748 | Anthropic | Proprietary |
数据仅供参考,以官方来源为准。模型名称旁的链接可跳转到 DataLearner 模型详情页。
常见问题 (FAQ)
什么是 LMArena Coding Arena?▼
LMArena Coding Arena 是 LMArena 旗下专注于代码能力的匿名评测平台。用户提交真实编程任务(如调试、代码生成、算法实现),系统将不同模型的输出并排展示(隐藏模型名称),由用户投票选出更好的答案,最终通过 Elo 算法汇总形成动态排行榜。
Coding Arena 与 SWE-bench、HumanEval 等静态基准有什么区别?▼
SWE-bench、HumanEval、MBPP 等静态基准使用固定测试集和自动化评分,可重现性强但容易被针对性优化("刷榜")。Coding Arena 来自真实用户的开放式需求,测试内容不固定,更能反映模型在实际编程场景中的表现,两者互为补充。
国产大模型在代码能力方面表现如何?▼
DeepSeek V3.2、Qwen3-235B 等国产模型在 Coding Arena 表现亮眼,已跻身全球前列。DeepSeek 以 MIT 协议开源,Qwen 系列支持中文编程场景,是开发者选择开源代码模型的重要参考。
如何用 AI 辅助日常编程工作?▼
常见场景包括:代码补全与生成(根据注释或函数签名生成实现)、调试(粘贴报错信息让 AI 定位问题)、代码审查(检查安全漏洞或性能问题)、单元测试生成,以及跨语言翻译(如将 Python 转为 TypeScript)。排行榜靠前的模型在上述场景中通常都有更好的表现。