DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
HomeOverall LeaderboardArcada Labs Code Categories Arena 代码能力排行榜

Arcada Labs Code Categories Arena 代码能力排行榜

基于 Arcada Labs Code Categories Arena 用户匿名投票的最新AI大模型代码能力排行榜,通过 Bradley-Terry 模型对 Website、UI Component、Game Dev、Data Visualization 等代码子类别进行综合评分与排名。

Top Model

Claude Opus 4.6

Top Score

1352.00

Model Count

118

Data version

2026年04月26日

Data source: LM Arena

Filters

Leaderboard snapshot month:

Ranking Table

RankModelScore95% CIVotesOrganizationLicense
CLClaude Opus 4.61352.00+/-0.8%12,728Anthropic/
CLClaude Opus 4.7 (Thinking)1347.00+/-1.6%3,539Anthropic/
CLClaude Opus 4.6 (Thinking)1346.00+/-0.9%10,621Anthropic/
4GLGLM 5.11343.00+/-1.3%5,075Zhipu AI/
5OPOpus 4.71338.00+/-1.2%6,674Anthropic/
6GLGLM 5 Turbo1336.00+/-1.3%5,337Zhipu AI/
7CLClaude Sonnet 4.61334.00+/-0.9%11,895Anthropic/
8KIKimi K2.61326.00+/-1.5%4,260Moonshot AI/
9MUMuse Spark1315.00+/-1.5%4,234Facebook AI研究实验室/
10GLGLM 51311.00+/-0.8%14,853Zhipu AI/
11GPGPT-5.51310.00+/-1.9%2,588OpenAI/
12GEGemini 3.1 Pro Preview1303.00+/-0.6%23,954Google Deep Mind/
13OPOpus 4.51302.00+/-0.6%24,119Anthropic/
14KIKimi K2.5 (Thinking)1302.00+/-0.8%15,648Moonshot AI/
15GLGLM-5V-Turbo1298.00+/-1.2%6,784智谱AI/
16QWQwen 3.6 Plus Preview1294.00+/-1.2%6,196阿里巴巴/
17MIMiniMax M2.71291.00+/-1.0%9,708MiniMax/
18GEGemini 3.1 Pro Preview1290.00+/-0.7%20,546Google Deep Mind/
19DEDeepSeek-V4-Flash1284.00+/-2.7%1,280DeepSeek/
20GLGLM 4.71283.00+/-0.6%27,062Zhipu AI/
21GPGPT-5.4 (Design Skill, Medium)1280.00+/-1.4%4,689OpenAI/
22GRGrok 4.20 Beta (Reasoning)1276.00+/-0.9%12,129xAI/
23GPGPT-5.4 (Medium)1274.00+/-0.9%10,643OpenAI/
24MIMiniMax M2.51269.00+/-0.9%11,512MiniMax/
25MIMiniMax M2.11253.00+/-0.7%20,908MiniMax/
26GEGemini 3 Flash Preview1252.00+/-1.5%4,446Google/
27GRGrok 4.20 Beta1251.00+/-0.9%12,448xAI/
28CLClaude Sonnet 4.5 (Thinking)1246.00+/-0.6%27,774Anthropic/
29CLClaude Sonnet 4.51244.00+/-0.6%28,790Anthropic/
30GPGPT-5.4 (Low)1243.00+/-0.9%11,111OpenAI/
31GPGPT-5.4 (None)1241.00+/-0.9%12,382OpenAI/
32QWQwen3.5-397B-A17B1241.00+/-1.1%8,130阿里巴巴/
33GLGLM 4.7 Flash1240.00+/-0.9%11,732Zhipu AI/
34CLClaude 3.7 Sonnet1239.00+/-0.8%15,330Anthropic/
35DEDeepSeek-V3.1 (Thinking)1238.00+/-0.8%16,330DeepSeek/
36DEDeepSeek V3.2-Exp1233.00+/-0.7%19,556DeepSeek-AI/
37GPGPT-5.1 (high)1233.00+/-0.8%16,155OpenAI/
38CLClaude Opus 4.1 (Thinking)1233.00+/-0.8%15,783Anthropic/
39GPGPT-5.2 (None)1232.00+/-0.7%21,279OpenAI/
40GPGPT-5.2 (medium)1231.00+/-0.7%20,043OpenAI/
41GPGPT-5 (high)1231.00+/-0.8%13,480OpenAI/
42DEDeepSeek V3.21230.00+/-0.7%20,964DeepSeek-AI/
43QWQwen3.5 Plus 02-151230.00+/-0.9%12,818Alibaba/
44CLClaude Opus 4.11229.00+/-0.6%28,106Anthropic/
45GLGLM 4.61228.00+/-0.7%17,009Zhipu AI/
46GLGLM 4.51227.00+/-0.7%19,733Zhipu AI/
47GPGPT-5.2 (Low)1227.00+/-0.7%21,553OpenAI/
48GPGPT-5 (Minimal)1227.00+/-0.6%28,451OpenAI/
49GPGPT-5.1 (Medium)1224.00+/-0.7%21,401OpenAI/
50CLClaude Opus 41223.00+/-0.8%16,752Anthropic/
51MIMiMo-V2-Flash1221.00+/-0.6%27,529Xiaomi/
52GPGPT-5.1 (Low)1218.00+/-0.7%22,268OpenAI/
53GEGemini 2.5-Pro1216.00+/-1.2%7,044Google Deep Mind/
54GPGPT-5.1 Codex1213.00+/-2.3%1,807OpenAI/
55GPGPT-5.1 (None)1213.00+/-0.7%22,419OpenAI/
56GPGPT-5.2 (High)1212.00+/-1.5%4,167OpenAI/
57GPGPT-5.3 Codex1207.00+/-0.8%14,564OpenAI/
58MIMistral Large 3 (2512)1206.00+/-0.6%24,367Mistral/
59QWQwen3 Coder 480B A35B Instruct1205.00+/-2.2%1,958Alibaba/
60CLClaude Sonnet 41204.00+/-0.7%17,623Anthropic/
61DEDeepSeek-R1-05281201.00+/-0.7%18,061DeepSeek-AI/
62GLGLM 4.5 Air1200.00+/-0.7%17,364Zhipu AI/
63CLClaude Sonnet 4 (Thinking)1198.00+/-0.8%16,310Anthropic/
64MIMiniMax M2 Stable1197.00+/-0.9%10,940MiniMax/
65TRTrinity Large Thinking1185.00+/-1.3%5,830Arcee AI/
66AEAesCoder-4B1184.00+/-0.5%31,729DesignFlow/
67MIMistral Medium 3.1 (2508)1183.00+/-0.6%22,705Mistral/
68GPGPT-5 mini (Default)1180.00+/-0.6%27,553OpenAI/
69CLClaude Haiku 4.51178.00+/-0.6%29,795Anthropic/
70DEDeepSeek-V3.11174.00+/-0.7%20,382DeepSeek-AI/
71QWQwen3 Max1174.00+/-0.6%27,887Alibaba/
72PRPrime Intellect: INTELLECT-31172.00+/-0.6%24,856Prime Intellect/
73DEDeepSeek-V3-03241170.00+/-0.7%19,375DeepSeek-AI/
74GEGemini 2.5 Flash Preview 09-20251166.00+/-0.7%19,444Google/
75KIKimi K2 0905 Preview1160.00+/-2.5%1,504Moonshot AI/
76GPGPT-5.1 Codex Mini1160.00+/-0.6%27,280OpenAI/
77GRGrok 4.1 Fast1154.00+/-0.6%29,438xAI/
78GRGrok 4 Fast1153.00+/-0.6%29,491xAI/
79GRGrok 4.1 Fast (Reasoning)1150.00+/-0.6%27,057xAI/
80GPGPT-5 nano (Default)1146.00+/-1.2%6,710OpenAI/
81KIKimi K2 Turbo Preview1145.00+/-2.1%2,096Moonshot AI/
82GEGemini 2.5 Flash Lite Preview 09-20251143.00+/-1.2%6,860Google/
83GEGemini 3.1 Flash-Lite Preview1141.00+/-0.8%15,055Google/
84MIMistral Medium 3 (2505)1131.00+/-1.2%6,396Mistral/
85MIMinistral 3 14B (2512)1127.00+/-2.0%2,379Mistral/
86GEGemini 2.5 Flash1121.00+/-1.2%6,960Google/
87V0v0-1.5-md1119.00+/-0.9%11,094Vercel/
88MIMinistral 3 8B (2512)1115.00+/-2.0%2,427Mistral/
89GRGrok 31115.00+/-0.6%26,958xAI/
90GRGrok 4 Fast (Reasoning)1101.00+/-0.5%30,061xAI/
91QWQwen3-235B-A22B-25071100.00+/-1.2%6,932阿里巴巴/
92KIKimi K21096.00+/-2.7%1,352Moonshot AI (Legacy)/
93MAMagistral Medium 1.2 (2509)1095.00+/-1.3%5,851Mistral/
94QWQwen3-235B-A22B-Thinking-25071095.00+/-1.2%6,169Alibaba/
95GPGPT-4.11088.00+/-2.3%1,747OpenAI/
96OPOpenAI o31082.00+/-2.7%1,365OpenAI/
97GRGrok 41079.00+/-0.6%24,127xAI/
98DEDevstral Medium1074.00+/-1.1%7,158Mistral/
99MIMinistral 3 3B (2512)1072.00+/-1.8%2,852Mistral/
100COCodestral 25081069.00+/-1.2%6,746Mistral/
101QWQwen3-235B-A22B1064.00+/-1.3%5,154Alibaba/
102GRGrok Code Fast 11061.00+/-1.4%4,296xAI/
103GPGPT-4.1 mini1056.00+/-2.5%1,566OpenAI/
104MAMagistral Small 1.2 (2509)1048.00+/-1.2%6,448Mistral/
105O4o4-mini1038.00+/-2.2%2,011OpenAI/
106OLOlmo 3.1 32B Think1037.00+/-0.7%16,246Allen AI/
107GPGPT OSS 120B1025.00+/-1.3%5,268OpenAI/
108GPGPT-4.1 nano1025.00+/-2.2%1,901OpenAI/
109QWQwen3 30B-A3B1004.00+/-1.9%2,575Alibaba/
110GRGrok 3 Mini992.00+/-1.1%7,626xAI/
111LLLlama 3.1 Nemotron Ultra 253B991.00+/-1.5%31,728NVIDIA/
112MIMistral Small 3.2969.00+/-2.7%1,243Mistral/
113LLLlama 4 Maverick942.00+/-2.3%1,678Facebook AI研究实验室/
114MIMistral Large 2.1 (2411)925.00+/-2.5%1,317Mistral/
115GPGPT-4o923.00+/-2.2%1,780OpenAI/
116COCodestral 2 (2501)896.00+/-2.4%1,444Mistral/
117DEDevstral Small 1.1869.00+/-2.5%1,250Mistral/
118LLLlama 4 Scout852.00+/-2.4%1,275Facebook AI研究实验室/

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.

关于本榜单

本榜单数据来源于 Design Arena,由 Y Combinator 支持的 Arcada Labs 开发,是专注于评测 AI 设计代码生成能力的众包匿名对战平台。

与 LMArena 评测通用文本和编程能力不同,Design Arena 的代码榜专门考察模型生成具有视觉呈现效果的前端代码的能力。平台将代码任务细分为 Website、UI 组件、游戏开发、数据可视化、SVG、Web App、移动端等多个子类别,每个子类别均有独立排行。

本页展示的是 Code Categories 综合榜,即将所有子类别的用户投票混合汇总后,统一用 Bradley-Terry 模型(类 Elo 算法)计算出的综合排名。每票等权,不对各子类别做加权处理,因此投票量较大的子类别(如 Website)对综合分数的影响更大。得分越高,代表模型在设计代码生成场景下的综合人类偏好越强。

常见问题 (FAQ)

什么是 Arcada Labs Code Categories Arena?▼
Arcada Labs Code Categories Arena 是 Arcada Labs 推出的专注于设计代码生成能力的匿名评测平台。评测覆盖 Website、UI 组件、游戏开发、数据可视化等多个代码生成子类别,通过将各子类别的用户投票混合后统一跑 Bradley-Terry 模型,生成综合榜单。
Arcada Code Arena 与 LMArena Coding Arena 有什么区别?▼
LMArena Coding Arena 主要评测通用编程能力(代码生成、调试、算法实现等),而 Arcada Code Arena 专注于"设计代码"类别,即具有视觉呈现效果的代码生成,如 HTML 页面、交互式 UI 组件、可视化图表和游戏原型。两者互补,前者侧重功能性代码,后者侧重创意视觉代码。
排名方法论是什么?投票量大的子类别会影响总榜吗?▼
Arcada Labs 将所有子类别(Website、UI Component、Game Dev、Data Visualization 等)的原始投票混池后统一跑一个 Bradley-Terry 模型,每票等权,不做子类加权求和。投票量大的子类别对总榜影响更大;某个子类极强但其它子类弱的模型,得分会被平滑,这与 LMArena 总榜的计算逻辑相似。
哪类模型在设计代码场景表现更好?▼
在设计代码场景中,大型多模态模型(能理解视觉需求)通常表现更好。Claude Opus 系列、GLM 5 系列在此类榜单中表现突出。此外,部分专门针对代码和 UI 生成优化的模型(如 Vercel v0、AesCoder 等)也有出色表现,说明专项优化在视觉代码生成领域有明显效果。