Arcada Labs Code Categories Arena 代码能力排行榜
基于 Arcada Labs Code Categories Arena 用户匿名投票的最新AI大模型代码能力排行榜,通过 Bradley-Terry 模型对 Website、UI Component、Game Dev、Data Visualization 等代码子类别进行综合评分与排名。
Top Model
Claude Opus 4.6
Top Score
1352.00
Model Count
118
Data version
2026年04月26日
Data source: LM Arena
Filters
Leaderboard snapshot month:
Ranking Table
| Rank | Model | Score | 95% CI | Votes | Organization | License |
|---|---|---|---|---|---|---|
| Claude Opus 4.6 | 1352.00 | +/-0.8% | 12,728 | Anthropic | / | |
| Claude Opus 4.7 (Thinking) | 1347.00 | +/-1.6% | 3,539 | Anthropic | / | |
| Claude Opus 4.6 (Thinking) | 1346.00 | +/-0.9% | 10,621 | Anthropic | / | |
| 4 | GLM 5.1 | 1343.00 | +/-1.3% | 5,075 | Zhipu AI | / |
| 5 | Opus 4.7 | 1338.00 | +/-1.2% | 6,674 | Anthropic | / |
| 6 | GLM 5 Turbo | 1336.00 | +/-1.3% | 5,337 | Zhipu AI | / |
| 7 | Claude Sonnet 4.6 | 1334.00 | +/-0.9% | 11,895 | Anthropic | / |
| 8 | Kimi K2.6 | 1326.00 | +/-1.5% | 4,260 | Moonshot AI | / |
| 9 | Muse Spark | 1315.00 | +/-1.5% | 4,234 | Facebook AI研究实验室 | / |
| 10 | GLM 5 | 1311.00 | +/-0.8% | 14,853 | Zhipu AI | / |
| 11 | GPT-5.5 | 1310.00 | +/-1.9% | 2,588 | OpenAI | / |
| 12 | Gemini 3.1 Pro Preview | 1303.00 | +/-0.6% | 23,954 | Google Deep Mind | / |
| 13 | Opus 4.5 | 1302.00 | +/-0.6% | 24,119 | Anthropic | / |
| 14 | Kimi K2.5 (Thinking) | 1302.00 | +/-0.8% | 15,648 | Moonshot AI | / |
| 15 | GLM-5V-Turbo | 1298.00 | +/-1.2% | 6,784 | 智谱AI | / |
| 16 | Qwen 3.6 Plus Preview | 1294.00 | +/-1.2% | 6,196 | 阿里巴巴 | / |
| 17 | MiniMax M2.7 | 1291.00 | +/-1.0% | 9,708 | MiniMax | / |
| 18 | Gemini 3.1 Pro Preview | 1290.00 | +/-0.7% | 20,546 | Google Deep Mind | / |
| 19 | DeepSeek-V4-Flash | 1284.00 | +/-2.7% | 1,280 | DeepSeek | / |
| 20 | GLM 4.7 | 1283.00 | +/-0.6% | 27,062 | Zhipu AI | / |
| 21 | GPT-5.4 (Design Skill, Medium) | 1280.00 | +/-1.4% | 4,689 | OpenAI | / |
| 22 | Grok 4.20 Beta (Reasoning) | 1276.00 | +/-0.9% | 12,129 | xAI | / |
| 23 | GPT-5.4 (Medium) | 1274.00 | +/-0.9% | 10,643 | OpenAI | / |
| 24 | MiniMax M2.5 | 1269.00 | +/-0.9% | 11,512 | MiniMax | / |
| 25 | MiniMax M2.1 | 1253.00 | +/-0.7% | 20,908 | MiniMax | / |
| 26 | Gemini 3 Flash Preview | 1252.00 | +/-1.5% | 4,446 | / | |
| 27 | Grok 4.20 Beta | 1251.00 | +/-0.9% | 12,448 | xAI | / |
| 28 | Claude Sonnet 4.5 (Thinking) | 1246.00 | +/-0.6% | 27,774 | Anthropic | / |
| 29 | Claude Sonnet 4.5 | 1244.00 | +/-0.6% | 28,790 | Anthropic | / |
| 30 | GPT-5.4 (Low) | 1243.00 | +/-0.9% | 11,111 | OpenAI | / |
| 31 | GPT-5.4 (None) | 1241.00 | +/-0.9% | 12,382 | OpenAI | / |
| 32 | Qwen3.5-397B-A17B | 1241.00 | +/-1.1% | 8,130 | 阿里巴巴 | / |
| 33 | GLM 4.7 Flash | 1240.00 | +/-0.9% | 11,732 | Zhipu AI | / |
| 34 | Claude 3.7 Sonnet | 1239.00 | +/-0.8% | 15,330 | Anthropic | / |
| 35 | DeepSeek-V3.1 (Thinking) | 1238.00 | +/-0.8% | 16,330 | DeepSeek | / |
| 36 | DeepSeek V3.2-Exp | 1233.00 | +/-0.7% | 19,556 | DeepSeek-AI | / |
| 37 | GPT-5.1 (high) | 1233.00 | +/-0.8% | 16,155 | OpenAI | / |
| 38 | Claude Opus 4.1 (Thinking) | 1233.00 | +/-0.8% | 15,783 | Anthropic | / |
| 39 | GPT-5.2 (None) | 1232.00 | +/-0.7% | 21,279 | OpenAI | / |
| 40 | GPT-5.2 (medium) | 1231.00 | +/-0.7% | 20,043 | OpenAI | / |
| 41 | GPT-5 (high) | 1231.00 | +/-0.8% | 13,480 | OpenAI | / |
| 42 | DeepSeek V3.2 | 1230.00 | +/-0.7% | 20,964 | DeepSeek-AI | / |
| 43 | Qwen3.5 Plus 02-15 | 1230.00 | +/-0.9% | 12,818 | Alibaba | / |
| 44 | Claude Opus 4.1 | 1229.00 | +/-0.6% | 28,106 | Anthropic | / |
| 45 | GLM 4.6 | 1228.00 | +/-0.7% | 17,009 | Zhipu AI | / |
| 46 | GLM 4.5 | 1227.00 | +/-0.7% | 19,733 | Zhipu AI | / |
| 47 | GPT-5.2 (Low) | 1227.00 | +/-0.7% | 21,553 | OpenAI | / |
| 48 | GPT-5 (Minimal) | 1227.00 | +/-0.6% | 28,451 | OpenAI | / |
| 49 | GPT-5.1 (Medium) | 1224.00 | +/-0.7% | 21,401 | OpenAI | / |
| 50 | Claude Opus 4 | 1223.00 | +/-0.8% | 16,752 | Anthropic | / |
| 51 | MiMo-V2-Flash | 1221.00 | +/-0.6% | 27,529 | Xiaomi | / |
| 52 | GPT-5.1 (Low) | 1218.00 | +/-0.7% | 22,268 | OpenAI | / |
| 53 | Gemini 2.5-Pro | 1216.00 | +/-1.2% | 7,044 | Google Deep Mind | / |
| 54 | GPT-5.1 Codex | 1213.00 | +/-2.3% | 1,807 | OpenAI | / |
| 55 | GPT-5.1 (None) | 1213.00 | +/-0.7% | 22,419 | OpenAI | / |
| 56 | GPT-5.2 (High) | 1212.00 | +/-1.5% | 4,167 | OpenAI | / |
| 57 | GPT-5.3 Codex | 1207.00 | +/-0.8% | 14,564 | OpenAI | / |
| 58 | Mistral Large 3 (2512) | 1206.00 | +/-0.6% | 24,367 | Mistral | / |
| 59 | Qwen3 Coder 480B A35B Instruct | 1205.00 | +/-2.2% | 1,958 | Alibaba | / |
| 60 | Claude Sonnet 4 | 1204.00 | +/-0.7% | 17,623 | Anthropic | / |
| 61 | DeepSeek-R1-0528 | 1201.00 | +/-0.7% | 18,061 | DeepSeek-AI | / |
| 62 | GLM 4.5 Air | 1200.00 | +/-0.7% | 17,364 | Zhipu AI | / |
| 63 | Claude Sonnet 4 (Thinking) | 1198.00 | +/-0.8% | 16,310 | Anthropic | / |
| 64 | MiniMax M2 Stable | 1197.00 | +/-0.9% | 10,940 | MiniMax | / |
| 65 | Trinity Large Thinking | 1185.00 | +/-1.3% | 5,830 | Arcee AI | / |
| 66 | AesCoder-4B | 1184.00 | +/-0.5% | 31,729 | DesignFlow | / |
| 67 | Mistral Medium 3.1 (2508) | 1183.00 | +/-0.6% | 22,705 | Mistral | / |
| 68 | GPT-5 mini (Default) | 1180.00 | +/-0.6% | 27,553 | OpenAI | / |
| 69 | Claude Haiku 4.5 | 1178.00 | +/-0.6% | 29,795 | Anthropic | / |
| 70 | DeepSeek-V3.1 | 1174.00 | +/-0.7% | 20,382 | DeepSeek-AI | / |
| 71 | Qwen3 Max | 1174.00 | +/-0.6% | 27,887 | Alibaba | / |
| 72 | Prime Intellect: INTELLECT-3 | 1172.00 | +/-0.6% | 24,856 | Prime Intellect | / |
| 73 | DeepSeek-V3-0324 | 1170.00 | +/-0.7% | 19,375 | DeepSeek-AI | / |
| 74 | Gemini 2.5 Flash Preview 09-2025 | 1166.00 | +/-0.7% | 19,444 | / | |
| 75 | Kimi K2 0905 Preview | 1160.00 | +/-2.5% | 1,504 | Moonshot AI | / |
| 76 | GPT-5.1 Codex Mini | 1160.00 | +/-0.6% | 27,280 | OpenAI | / |
| 77 | Grok 4.1 Fast | 1154.00 | +/-0.6% | 29,438 | xAI | / |
| 78 | Grok 4 Fast | 1153.00 | +/-0.6% | 29,491 | xAI | / |
| 79 | Grok 4.1 Fast (Reasoning) | 1150.00 | +/-0.6% | 27,057 | xAI | / |
| 80 | GPT-5 nano (Default) | 1146.00 | +/-1.2% | 6,710 | OpenAI | / |
| 81 | Kimi K2 Turbo Preview | 1145.00 | +/-2.1% | 2,096 | Moonshot AI | / |
| 82 | Gemini 2.5 Flash Lite Preview 09-2025 | 1143.00 | +/-1.2% | 6,860 | / | |
| 83 | Gemini 3.1 Flash-Lite Preview | 1141.00 | +/-0.8% | 15,055 | / | |
| 84 | Mistral Medium 3 (2505) | 1131.00 | +/-1.2% | 6,396 | Mistral | / |
| 85 | Ministral 3 14B (2512) | 1127.00 | +/-2.0% | 2,379 | Mistral | / |
| 86 | Gemini 2.5 Flash | 1121.00 | +/-1.2% | 6,960 | / | |
| 87 | v0-1.5-md | 1119.00 | +/-0.9% | 11,094 | Vercel | / |
| 88 | Ministral 3 8B (2512) | 1115.00 | +/-2.0% | 2,427 | Mistral | / |
| 89 | Grok 3 | 1115.00 | +/-0.6% | 26,958 | xAI | / |
| 90 | Grok 4 Fast (Reasoning) | 1101.00 | +/-0.5% | 30,061 | xAI | / |
| 91 | Qwen3-235B-A22B-2507 | 1100.00 | +/-1.2% | 6,932 | 阿里巴巴 | / |
| 92 | Kimi K2 | 1096.00 | +/-2.7% | 1,352 | Moonshot AI (Legacy) | / |
| 93 | Magistral Medium 1.2 (2509) | 1095.00 | +/-1.3% | 5,851 | Mistral | / |
| 94 | Qwen3-235B-A22B-Thinking-2507 | 1095.00 | +/-1.2% | 6,169 | Alibaba | / |
| 95 | GPT-4.1 | 1088.00 | +/-2.3% | 1,747 | OpenAI | / |
| 96 | OpenAI o3 | 1082.00 | +/-2.7% | 1,365 | OpenAI | / |
| 97 | Grok 4 | 1079.00 | +/-0.6% | 24,127 | xAI | / |
| 98 | Devstral Medium | 1074.00 | +/-1.1% | 7,158 | Mistral | / |
| 99 | Ministral 3 3B (2512) | 1072.00 | +/-1.8% | 2,852 | Mistral | / |
| 100 | Codestral 2508 | 1069.00 | +/-1.2% | 6,746 | Mistral | / |
| 101 | Qwen3-235B-A22B | 1064.00 | +/-1.3% | 5,154 | Alibaba | / |
| 102 | Grok Code Fast 1 | 1061.00 | +/-1.4% | 4,296 | xAI | / |
| 103 | GPT-4.1 mini | 1056.00 | +/-2.5% | 1,566 | OpenAI | / |
| 104 | Magistral Small 1.2 (2509) | 1048.00 | +/-1.2% | 6,448 | Mistral | / |
| 105 | o4-mini | 1038.00 | +/-2.2% | 2,011 | OpenAI | / |
| 106 | Olmo 3.1 32B Think | 1037.00 | +/-0.7% | 16,246 | Allen AI | / |
| 107 | GPT OSS 120B | 1025.00 | +/-1.3% | 5,268 | OpenAI | / |
| 108 | GPT-4.1 nano | 1025.00 | +/-2.2% | 1,901 | OpenAI | / |
| 109 | Qwen3 30B-A3B | 1004.00 | +/-1.9% | 2,575 | Alibaba | / |
| 110 | Grok 3 Mini | 992.00 | +/-1.1% | 7,626 | xAI | / |
| 111 | Llama 3.1 Nemotron Ultra 253B | 991.00 | +/-1.5% | 31,728 | NVIDIA | / |
| 112 | Mistral Small 3.2 | 969.00 | +/-2.7% | 1,243 | Mistral | / |
| 113 | Llama 4 Maverick | 942.00 | +/-2.3% | 1,678 | Facebook AI研究实验室 | / |
| 114 | Mistral Large 2.1 (2411) | 925.00 | +/-2.5% | 1,317 | Mistral | / |
| 115 | GPT-4o | 923.00 | +/-2.2% | 1,780 | OpenAI | / |
| 116 | Codestral 2 (2501) | 896.00 | +/-2.4% | 1,444 | Mistral | / |
| 117 | Devstral Small 1.1 | 869.00 | +/-2.5% | 1,250 | Mistral | / |
| 118 | Llama 4 Scout | 852.00 | +/-2.4% | 1,275 | Facebook AI研究实验室 | / |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.
关于本榜单
本榜单数据来源于 Design Arena,由 Y Combinator 支持的 Arcada Labs 开发,是专注于评测 AI 设计代码生成能力的众包匿名对战平台。
与 LMArena 评测通用文本和编程能力不同,Design Arena 的代码榜专门考察模型生成具有视觉呈现效果的前端代码的能力。平台将代码任务细分为 Website、UI 组件、游戏开发、数据可视化、SVG、Web App、移动端等多个子类别,每个子类别均有独立排行。
本页展示的是 Code Categories 综合榜,即将所有子类别的用户投票混合汇总后,统一用 Bradley-Terry 模型(类 Elo 算法)计算出的综合排名。每票等权,不对各子类别做加权处理,因此投票量较大的子类别(如 Website)对综合分数的影响更大。得分越高,代表模型在设计代码生成场景下的综合人类偏好越强。
常见问题 (FAQ)
什么是 Arcada Labs Code Categories Arena?▼
Arcada Labs Code Categories Arena 是 Arcada Labs 推出的专注于设计代码生成能力的匿名评测平台。评测覆盖 Website、UI 组件、游戏开发、数据可视化等多个代码生成子类别,通过将各子类别的用户投票混合后统一跑 Bradley-Terry 模型,生成综合榜单。
Arcada Code Arena 与 LMArena Coding Arena 有什么区别?▼
LMArena Coding Arena 主要评测通用编程能力(代码生成、调试、算法实现等),而 Arcada Code Arena 专注于"设计代码"类别,即具有视觉呈现效果的代码生成,如 HTML 页面、交互式 UI 组件、可视化图表和游戏原型。两者互补,前者侧重功能性代码,后者侧重创意视觉代码。
排名方法论是什么?投票量大的子类别会影响总榜吗?▼
Arcada Labs 将所有子类别(Website、UI Component、Game Dev、Data Visualization 等)的原始投票混池后统一跑一个 Bradley-Terry 模型,每票等权,不做子类加权求和。投票量大的子类别对总榜影响更大;某个子类极强但其它子类弱的模型,得分会被平滑,这与 LMArena 总榜的计算逻辑相似。
哪类模型在设计代码场景表现更好?▼
在设计代码场景中,大型多模态模型(能理解视觉需求)通常表现更好。Claude Opus 系列、GLM 5 系列在此类榜单中表现突出。此外,部分专门针对代码和 UI 生成优化的模型(如 Vercel v0、AesCoder 等)也有出色表现,说明专项优化在视觉代码生成领域有明显效果。