Arcada Labs Code Categories Arena 代码能力排行榜
基于 Arcada Labs Code Categories Arena 用户匿名投票的最新AI大模型代码能力排行榜,通过 Bradley-Terry 模型对 Website、UI Component、Game Dev、Data Visualization 等代码子类别进行综合评分与排名。
榜首模型
Claude Fable 5
最高得分
1352.00
模型数量
129
数据版本
2026年06月13日
数据来源: Arcada Labs
排名总表
| 排名 | 模型名称 | 得分 | 95% CI | 投票数 | 机构 | 许可证 |
|---|---|---|---|---|---|---|
Claude Fable 5Anthropic | 1352.00 | +/-12.1 | 3,528 | Anthropic | Proprietary | |
Claude Opus 4.6Anthropic | 1343.00 | +/-5.7 | 17,547 | Anthropic | Proprietary | |
Opus 4.7 (thinking)Anthropic | 1339.00 | +/-7.5 | 9,695 | Anthropic | Proprietary | |
| 4 | Claude Opus 4.6 (thinking)Anthropic | 1337.00 | +/-6.2 | 14,857 | Anthropic | Proprietary |
| 5 | GLM 5.1智谱AI | 1332.00 | +/-10 | 5,306 | 智谱AI | Open Source |
| 6 | Kimi K2.6Moonshot AI | 1332.00 | +/-5.4 | 19,693 | Moonshot AI | Open Source |
| 7 | GLM-5-Turbo智谱AI | 1326.00 | +/-5.1 | 22,226 | 智谱AI | Proprietary |
| 8 | Opus 4.7Anthropic | 1325.00 | +/-6.5 | 13,174 | Anthropic | Proprietary |
| 9 | Claude Sonnet 4.6Anthropic | 1325.00 | +/-5.8 | 16,730 | Anthropic | Proprietary |
| 10 | MiMo V2.5 ProXiaomi | 1323.00 | +/-11.6 | 3,820 | Xiaomi | Open Source |
| 11 | MiniMax M3MiniMax | 1315.00 | +/-9.2 | 5,954 | MiniMax | Open Source |
| 12 | Qwen3.7 Max阿里巴巴 | 1312.00 | +/-7.4 | 9,699 | 阿里巴巴 | Proprietary |
| 13 | MiMo V2.5Xiaomi | 1305.00 | +/-4.8 | 25,579 | Xiaomi | Open Source |
| 14 | Muse SparkFacebook AI研究实验室 | 1303.00 | +/-10.9 | 4,249 | Facebook AI研究实验室 | Proprietary |
| 15 | Gemini 3.5 FlashGoogle Deep Mind | 1299.00 | +/-7.7 | 8,856 | Google Deep Mind | Proprietary |
| 16 | GPT-5.5OpenAI | 1299.00 | +/-7.1 | 10,232 | OpenAI | Proprietary |
| 17 | DeepSeek-V4-ProDeepSeek-AI | 1297.00 | +/-6.6 | 12,237 | DeepSeek-AI | Open Source |
| 18 | GLM-5智谱AI | 1297.00 | +/-4 | 40,865 | 智谱AI | Open Source |
| 19 | Opus 4.5Anthropic | 1293.00 | +/-4.4 | 29,814 | Anthropic | Proprietary |
| 20 | Gemini 3.1 Pro PreviewGoogle Deep Mind | 1292.00 | +/-4.9 | 23,843 | Google Deep Mind | Proprietary |
| 21 | Kimi K2.5 (thinking)Moonshot AI | 1288.00 | +/-4.2 | 35,262 | Moonshot AI | Open Source |
| 22 | Claude Opus 4.8Anthropic | 1282.00 | +/-7.7 | 8,538 | Anthropic | Proprietary |
| 23 | 1282.00 | +/-4.7 | 26,278 | MiniMaxAI | Open Source | |
| 24 | GLM-5V-Turbo智谱AI | 1280.00 | +/-4.7 | 26,151 | 智谱AI | Open Source |
| 25 | Gemini 3.1 Pro PreviewGoogle Deep Mind | 1279.00 | +/-4.5 | 28,225 | Google Deep Mind | Proprietary |
| 26 | Qwen 3.6 Plus Preview阿里巴巴 | 1278.00 | +/-5.2 | 20,906 | 阿里巴巴 | Proprietary |
| 27 | GLM-4.7智谱AI | 1269.00 | +/-3.8 | 42,337 | 智谱AI | Open Source |
| 28 | 1269.00 | +/-5.2 | 20,044 | xAI | Proprietary | |
| 29 | GPT-5.4 (Design Skill, Medium)OpenAI | 1266.00 | +/-8.1 | 7,633 | OpenAI | Proprietary |
| 30 | DeepSeek-V4-FlashDeepSeek-AI | 1264.00 | +/-5.3 | 19,662 | DeepSeek-AI | Open Source |
| 31 | GPT-5.4 (medium)OpenAI | 1261.00 | +/-5.9 | 15,138 | OpenAI | Proprietary |
| 32 | 1258.00 | +/-6.7 | 11,504 | MiniMaxAI | Open Source | |
| 33 | 1252.00 | +/-6 | 14,749 | xAI | Proprietary | |
| 34 | 1249.00 | +/-5.1 | 20,935 | xAI | Proprietary | |
| 35 | 1242.00 | +/-5.1 | 20,803 | MiniMaxAI | Open Source | |
| 36 | Gemini 3.0 FlashGoogle Deep Mind | 1241.00 | +/-10.6 | 4,414 | Google Deep Mind | Proprietary |
| 37 | Claude Sonnet 4.5 (thinking)Anthropic | 1234.00 | +/-4.1 | 34,271 | Anthropic | Proprietary |
| 38 | Claude Sonnet 4.5Anthropic | 1233.00 | +/-4.1 | 34,958 | Anthropic | Proprietary |
| 39 | Qwen3.5-397B-A17B阿里巴巴 | 1231.00 | +/-7.9 | 8,129 | 阿里巴巴 | Open Source |
| 40 | GPT-5.4 (low)OpenAI | 1230.00 | +/-5.6 | 16,972 | OpenAI | Proprietary |
| 41 | GPT-5.4 (None)OpenAI | 1230.00 | +/-5.3 | 19,064 | OpenAI | Proprietary |
| 42 | GLM-4.7-Flash智谱AI | 1229.00 | +/-6.6 | 11,706 | 智谱AI | Open Source |
| 43 | Claude Sonnet 3.7Anthropic | 1228.00 | +/-5.9 | 15,245 | Anthropic | Proprietary |
| 44 | DeepSeek-V3.1 (thinking)DeepSeek-AI | 1227.00 | +/-5.7 | 16,258 | DeepSeek-AI | Open Source |
| 45 | Opus 4.1 (thinking)Anthropic | 1223.00 | +/-5.8 | 15,677 | Anthropic | Proprietary |
| 46 | GPT-5.1 (high)OpenAI | 1223.00 | +/-5.7 | 16,057 | OpenAI | Proprietary |
| 47 | DeepSeek V3.2-ExpDeepSeek-AI | 1222.00 | +/-5.2 | 19,490 | DeepSeek-AI | Open Source |
| 48 | GPT-5.2 (None)OpenAI | 1221.00 | +/-4.6 | 25,824 | OpenAI | Proprietary |
| 49 | GPT-5.2 (medium)OpenAI | 1221.00 | +/-4.7 | 24,671 | OpenAI | Proprietary |
| 50 | GPT-5 (high)OpenAI | 1220.00 | +/-6.2 | 13,397 | OpenAI | Proprietary |
| 51 | Qwen3.5 Plus (0215)阿里巴巴 | 1219.00 | +/-5.3 | 18,993 | 阿里巴巴 | Proprietary |
| 52 | DeepSeek V3.2DeepSeek-AI | 1218.00 | +/-4.8 | 24,314 | DeepSeek-AI | Open Source |
| 53 | Step 3.7 FlashStepFun | 1218.00 | +/-8.4 | 7,214 | StepFun | Open Source |
| 54 | GLM-4.5智谱AI | 1217.00 | +/-5.2 | 19,637 | 智谱AI | Open Source |
| 55 | GLM-4.6智谱AI | 1217.00 | +/-5.6 | 16,911 | 智谱AI | Open Source |
| 56 | GPT-5 (minimal)OpenAI | 1217.00 | +/-4.2 | 33,232 | OpenAI | Proprietary |
| 57 | GPT-5.2 (low)OpenAI | 1217.00 | +/-4.6 | 25,745 | OpenAI | Proprietary |
| 58 | Opus 4.1Anthropic | 1216.00 | +/-4.1 | 34,520 | Anthropic | Proprietary |
| 59 | GPT-5.1 (medium)OpenAI | 1213.00 | +/-5 | 21,291 | OpenAI | Proprietary |
| 60 | Claude Opus 4Anthropic | 1212.00 | +/-5.6 | 16,669 | Anthropic | Proprietary |
| 61 | GPT-5.1 (low)OpenAI | 1207.00 | +/-4.9 | 22,159 | OpenAI | Proprietary |
| 62 | MiMo-V2-FlashXiaomi | 1207.00 | +/-4.1 | 34,555 | Xiaomi | Open Source |
| 63 | Gemini 2.5-ProGoogle Deep Mind | 1205.00 | +/-8.5 | 7,044 | Google Deep Mind | Proprietary |
| 64 | GPT-5.1 CodexOpenAI | 1202.00 | +/-16.4 | 1,807 | OpenAI | Proprietary |
| 65 | GPT-5.1 (None)OpenAI | 1202.00 | +/-4.9 | 22,276 | OpenAI | Proprietary |
| 66 | GPT-5.2 (high)OpenAI | 1201.00 | +/-10.8 | 4,167 | OpenAI | Proprietary |
| 67 | GPT-5.3 CodexOpenAI | 1196.00 | +/-5.8 | 15,763 | OpenAI | Proprietary |
| 68 | Qwen3-Coder-480B-A35B阿里巴巴 | 1194.00 | +/-16.3 | 1,958 | 阿里巴巴 | Open Source |
| 69 | Claude Sonnet 4Anthropic | 1193.00 | +/-5.5 | 17,532 | Anthropic | Proprietary |
| 70 | Mistral Large 3MistralAI | 1193.00 | +/-4.3 | 30,809 | MistralAI | Open Source |
| 71 | DeepSeek-R1-0528DeepSeek-AI | 1190.00 | +/-5.4 | 17,944 | DeepSeek-AI | Open Source |
| 72 | GLM-4.5-Air智谱AI | 1189.00 | +/-5.5 | 17,256 | 智谱AI | Open Source |
| 73 | Claude Sonnet 4 (thinking)Anthropic | 1188.00 | +/-5.7 | 16,227 | Anthropic | Proprietary |
| 74 | 1186.00 | +/-6.8 | 10,828 | MiniMaxAI | Open Source | |
| 75 | AesCoder-4BDesignFlow | 1176.00 | +/-3.9 | 39,734 | DesignFlow | Open Source |
| 76 | Mistral Medium 3.5MistralAI | 1174.00 | +/-7 | 10,885 | MistralAI | Open Source |
| 77 | Mistral Medium 3.1 (2508)Mistral | 1172.00 | +/-4.5 | 27,998 | Mistral | Proprietary |
| 78 | Trinity Large ThinkingArcee AI | 1168.00 | +/-6.5 | 12,815 | Arcee AI | Open Source |
| 79 | Haiku 4.5Anthropic | 1166.00 | +/-4 | 35,968 | Anthropic | Proprietary |
| 80 | GPT-5-miniOpenAI | 1166.00 | +/-4.2 | 33,066 | OpenAI | Proprietary |
| 81 | DeepSeek-V3.1DeepSeek-AI | 1163.00 | +/-5.1 | 20,278 | DeepSeek-AI | Open Source |
| 82 | Qwen3-Max-Thinking阿里巴巴 | 1161.00 | +/-4.2 | 33,787 | 阿里巴巴 | Proprietary |
| 83 | DeepSeek-V3-0324DeepSeek-AI | 1160.00 | +/-5.2 | 19,257 | DeepSeek-AI | Open Source |
| 84 | Prime Intellect: INTELLECT-3Prime Intellect | 1158.00 | +/-4.3 | 31,318 | Prime Intellect | Open Source |
| 85 | Gemini 2.5 Flash-Preview-09-2025Google Deep Mind | 1156.00 | +/-5.2 | 19,299 | Google Deep Mind | Proprietary |
| 86 | 1152.00 | +/-4 | 37,227 | xAI | Proprietary | |
| 87 | Kimi K2 0905Moonshot AI | 1149.00 | +/-17.9 | 1,504 | Moonshot AI | Open Source |
| 88 | GPT-5.1 Codex MiniOpenAI | 1145.00 | +/-4.2 | 33,970 | OpenAI | Proprietary |
| 89 | 1144.00 | +/-4.2 | 33,893 | xAI | Proprietary | |
| 90 | 1139.00 | +/-4.3 | 31,593 | xAI | Proprietary | |
| 91 | GPT-5-NanoOpenAI | 1136.00 | +/-8.6 | 6,710 | OpenAI | Proprietary |
| 92 | Kimi K2 Turbo PreviewMoonshot AI | 1135.00 | +/-15.2 | 2,094 | Moonshot AI | Open Source |
| 93 | Gemini 2.5 Flash-Lite-Preview-09-2025Google Deep Mind | 1133.00 | +/-8.5 | 6,860 | Google Deep Mind | Proprietary |
| 94 | Gemini 3.1 Flash-Lite PreviewGoogle | 1123.00 | +/-5 | 23,509 | Proprietary | |
| 95 | Phi-3-medium 14B-previewMicrosoft Azure | 1121.00 | +/-8.9 | 6,396 | Microsoft Azure | Proprietary |
| 96 | Ministral 3 14BMistralAI | 1117.00 | +/-14.4 | 2,379 | MistralAI | Open Source |
| 97 | Gemini 2.5 FlashGoogle Deep Mind | 1111.00 | +/-8.5 | 6,960 | Google Deep Mind | Proprietary |
| 98 | Reve v1.5Reve AI | 1108.00 | +/-6.9 | 11,081 | Reve AI | Proprietary |
| 99 | Ministral 3 8BMistralAI | 1105.00 | +/-14.3 | 2,427 | MistralAI | Open Source |
| 100 | 1104.00 | +/-4.6 | 26,860 | xAI | Proprietary | |
| 101 | 1103.00 | +/-4.1 | 37,880 | xAI | Proprietary | |
| 102 | Qwen3-235B-A22B-2507阿里巴巴 | 1090.00 | +/-8.6 | 6,932 | 阿里巴巴 | Open Source |
| 103 | Kimi K2Moonshot AI | 1085.00 | +/-19.4 | 1,352 | Moonshot AI | Open Source |
| 104 | Magistral Medium 1.2 (2509)Mistral | 1085.00 | +/-9.4 | 5,851 | Mistral | Proprietary |
| 105 | Qwen3-235B-A22B-Thinking-2507Alibaba | 1084.00 | +/-9.1 | 6,169 | Alibaba | Open Source |
| 106 | GPT-4.1OpenAI | 1077.00 | +/-17.3 | 1,747 | OpenAI | Proprietary |
| 107 | OpenAI o3OpenAI | 1071.00 | +/-19.5 | 1,365 | OpenAI | Proprietary |
| 108 | 1068.00 | +/-4.9 | 23,998 | xAI | Proprietary | |
| 109 | Devstral MediumMistralAI | 1064.00 | +/-8.5 | 7,158 | MistralAI | Proprietary |
| 110 | Ministral 3 3B (2512)Mistral | 1062.00 | +/-13.5 | 2,852 | Mistral | Open Source |
| 111 | Codestral 2508Mistral | 1059.00 | +/-8.8 | 6,745 | Mistral | Proprietary |
| 112 | Qwen3-235B-A22B阿里巴巴 | 1054.00 | +/-10.1 | 5,154 | 阿里巴巴 | Open Source |
| 113 | 1050.00 | +/-11.1 | 4,295 | xAI | Proprietary | |
| 114 | GPT-4.1 miniOpenAI | 1045.00 | +/-18.3 | 1,566 | OpenAI | Proprietary |
| 115 | Magistral Small 1.2 (2509)Mistral | 1037.00 | +/-9.2 | 6,448 | Mistral | Open Source |
| 116 | OpenAI o4 - miniOpenAI | 1027.00 | +/-16.2 | 2,011 | OpenAI | Proprietary |
| 117 | Olmo 3.1 32B ThinkAllen AI | 1026.00 | +/-6.3 | 16,162 | Allen AI | Open Source |
| 118 | GPT OSS 120BOpenAI | 1015.00 | +/-10.3 | 5,268 | OpenAI | Open Source |
| 119 | GPT-4.1 nanoOpenAI | 1014.00 | +/-16.8 | 1,901 | OpenAI | Proprietary |
| 120 | Qwen3-30B-A3B阿里巴巴 | 993.00 | +/-14.5 | 2,575 | 阿里巴巴 | Open Source |
| 121 | 982.00 | +/-8.7 | 7,626 | xAI | Proprietary | |
| 122 | Llama 3.1 Nemotron Ultra 253BNVIDIA | 981.00 | +/-13.8 | 3,172 | NVIDIA | Open Source |
| 123 | Mistral-Small-3.2MistralAI | 958.00 | +/-20.8 | 1,243 | MistralAI | Open Source |
| 124 | Llama 4 MaverickFacebook AI研究实验室 | 931.00 | +/-18.4 | 1,678 | Facebook AI研究实验室 | Open Source |
| 125 | Mistral Large 2.1 (2411)Mistral | 915.00 | +/-21 | 1,317 | Mistral | Proprietary |
| 126 | GPT-4oOpenAI | 912.00 | +/-18.1 | 1,780 | OpenAI | Proprietary |
| 127 | Codestral 2 (2501)Mistral | 885.00 | +/-20.6 | 1,444 | Mistral | Open Source |
| 128 | Devstral Small 1.1MistralAI | 859.00 | +/-22.5 | 1,250 | MistralAI | Open Source |
| 129 | Llama 4 ScoutFacebook AI研究实验室 | 841.00 | +/-22.6 | 1,275 | Facebook AI研究实验室 | Open Source |
数据仅供参考,以官方来源为准。模型名称旁的链接可跳转到 DataLearner 模型详情页。
关于本榜单
本榜单数据来源于Design Arena,由 Y Combinator 支持的 Arcada Labs 开发,是专注于评测 AI 设计代码生成能力的众包匿名对战平台。
与 LMArena 评测通用文本和编程能力不同,Design Arena 的代码榜专门考察模型生成具有视觉呈现效果的前端代码的能力。平台将代码任务细分为 Website、UI 组件、游戏开发、数据可视化、SVG、Web App、移动端等多个子类别,每个子类别均有独立排行。
本页展示的是 Code Categories 综合榜,即将所有子类别的用户投票混合汇总后,统一用 Bradley-Terry 模型(类 Elo 算法)计算出的综合排名。每票等权,不对各子类别做加权处理,因此投票量较大的子类别(如 Website)对综合分数的影响更大。得分越高,代表模型在设计代码生成场景下的综合人类偏好越强。
常见问题 (FAQ)
什么是 Arcada Labs Code Categories Arena?
Arcada Labs Code Categories Arena 是专注于设计代码生成能力的匿名评测平台,覆盖 Website、UI 组件、游戏开发、数据可视化等多个代码生成子类别,并将投票汇总为综合榜单。
Arcada Code Arena 与 LMArena Coding Arena 有什么区别?
LMArena Coding Arena 主要评测通用编程能力,例如代码生成、调试和算法实现;Arcada Code Arena 专注于具有视觉呈现效果的前端设计代码,例如 HTML 页面、交互 UI、图表、SVG 和原型。
排名方法论是什么?
Arcada Labs 将各代码子类别的原始投票混合后运行 Bradley-Terry 模型。每票等权,不按子类别单独加权,因此投票量较大的子类别会对综合分数产生更大影响。
哪类模型在设计代码场景表现更好?
具备强视觉理解和前端代码生成能力的大模型通常表现更好。针对 UI 和代码生成优化的专项模型,在布局、交互和视觉细节任务上也可能有突出表现。










