LMArena Coding Arena Leaderboard

The latest AI coding model leaderboard based on LMArena Coding Arena anonymous user voting. Covers Elo scores, confidence intervals, and vote counts for Claude, GPT, Gemini, DeepSeek, Qwen, and more.

Top Model

Kimi K2.6

Top Score

1510.00

Model Count

362

Data version

2026年06月16日

Data source: LM Arena

About This Leaderboard

This leaderboard ranks AI models by coding ability. Data comes from LMArena (formerly LMSYS Chatbot Arena)'s Coding sub-track, evaluated through anonymous blind testing by real users on programming tasks.

Methodology Overview

Blind testing: Users submit coding questions, two anonymous models generate code answers, and users vote for the better response — eliminating brand bias.

Elo scoring: Uses the Bradley-Terry model to calculate Elo scores. Higher scores mean users more frequently prefer that model's code solutions.

Broad scenario coverage: Testing spans code generation, bug fixing, algorithm implementation, code explanation, and more real-world programming scenarios.

DataLearner provides in-depth analysis on top of the raw data, linking leaderboard models to the DataLearner model database so you can quickly access model details, API pricing, benchmark scores, and more.

Origin:AllChina
Leaderboard snapshot month:

Ranking Table

RankModelScore95% CIVotesOrganizationLicense
31Moonshot AIKimi K2.6Moonshot AI1510.00+/-86,984Moonshot AIModified MIT
37MiniMaxminimax-m3MiniMax1505.00+/-113,174MiniMaxProprietary
38Moonshot AIKimi K2.5 InstantMoonshot AI1505.00+/-141,800Moonshot AIModified MIT
41Moonshot AIKimi K2 ThinkingMoonshot AI1503.00+/-612,764Moonshot AIModified MIT
42DeepSeek-AIDeepSeek-V4-ProDeepSeek-AI1502.00+/-88,475DeepSeek-AIMIT
51DeepSeek-AIDeepSeek-V4-Pro (thinking)DeepSeek-AI1494.00+/-87,788DeepSeek-AIMIT
62Moonshot AIKimi K2 Thinking (thinking-turbo)Moonshot AI1487.00+/-614,857Moonshot AIModified MIT
66DeepSeek-AIDeepSeek-V4-FlashDeepSeek-AI1480.00+/-88,206DeepSeek-AIMIT
68MiniMaxAIMiniMax-M2.7MiniMaxAI1479.00+/-710,023MiniMaxAIModified MIT
72DeepSeek-AIDeepSeek-V4-Flash (thinking)DeepSeek-AI1477.00+/-88,128DeepSeek-AIMIT
73DeepSeek-AIDeepSeek V3.2 (thinking)DeepSeek-AI1476.00+/-78,533DeepSeek-AIMIT
74DeepSeek-AIDeepSeek V3.2-Exp (thinking)DeepSeek-AI1475.00+/-131,919DeepSeek-AIMIT
75Alibabaqwen3-max-2025-09-23Alibaba1475.00+/-132,040AlibabaProprietary
81DeepSeek-AIDeepSeek V3.2DeepSeek-AI1470.00+/-610,631DeepSeek-AIMIT
83Moonshot AIKimi K2 0905Moonshot AI1468.00+/-132,241Moonshot AIModified MIT
87DeepSeek-AIDeepSeek V3.2-ExpDeepSeek-AI1466.00+/-122,499DeepSeek-AIMIT
90DeepSeek-AIDeepSeek-R1-0528DeepSeek-AI1465.00+/-112,729DeepSeek-AIMIT
92DeepSeek-AIDeepSeek-V3.1 Terminus (thinking)DeepSeek-AI1464.00+/-24636DeepSeek-AIMIT
95Moonshot AIKimi K2Moonshot AI1460.00+/-85,244Moonshot AIModified MIT
97Tencenthunyuan-hy3-previewTencent1459.00+/-141,974Tencenttencent-hunyuan-community
105DeepSeek-AIDeepSeek-V3.1 (thinking)DeepSeek-AI1457.00+/-131,903DeepSeek-AIMIT
112StepFunAIStep 3.5 FlashStepFunAI1449.00+/-611,088StepFunAIApache 2.0
114DeepSeek-AIDeepSeek-V3.1DeepSeek-AI1447.00+/-122,622DeepSeek-AIMIT
115Alibabaqwen3-235b-a22b-no-thinkingAlibaba1447.00+/-86,973AlibabaApache 2.0
118MiniMaxAIMiniMax M2.5MiniMaxAI1445.00+/-710,915MiniMaxAIModified MIT
119DeepSeek-AIDeepSeek-R1DeepSeek-AI1445.00+/-122,317DeepSeek-AIMIT
121Alibabaqwen3-235b-a22b-thinking-2507Alibaba1442.00+/-151,610AlibabaApache 2.0
124MiniMaxAIM2.1MiniMaxAI1439.00+/-103,427MiniMaxAIMIT
125DeepSeek-AIDeepSeek-V3.1 TerminusDeepSeek-AI1439.00+/-21778DeepSeek-AIMIT
126Tencenthunyuan-vision-1.5-thinkingTencent1438.00+/-27435TencentProprietary
139StepFunAIStep 3.5 FlashStepFunAI1432.00+/-711,611StepFunAIProprietary
143DeepSeek-AIDeepSeek-V3-0324DeepSeek-AI1429.00+/-78,367DeepSeek-AIMIT
152MiniMaxminimax-m1MiniMax1416.00+/-86,486MiniMaxApache 2.0
160StepFunAIStep3StepFunAI1408.00+/-171,233StepFunAIApache 2.0
165Tencenthunyuan-turbos-20250226Tencent1400.00+/-31275TencentProprietary
171Tencenthunyuan-turbos-20250416Tencent1395.00+/-141,776TencentProprietary
178DeepSeek-AIDeepSeek-V3DeepSeek-AI1388.00+/-103,280DeepSeek-AIDeepSeek
185MiniMaxAIMiniMax M2MiniMaxAI1385.00+/-151,544MiniMaxAIApache 2.0
189Alibabaqwen-plus-0125Alibaba1380.00+/-18893AlibabaProprietary
191DeepSeekdeepseek-v2.5-1210DeepSeek1375.00+/-171,079DeepSeekDeepSeek
194Tencenthunyuan-turbo-0110Tencent1372.00+/-30299TencentProprietary
195StepFunstep-2-16k-exp-202412StepFun1372.00+/-20737StepFunProprietary
200DeepSeek-AIDeepSeek V2.5DeepSeek-AI1369.00+/-94,252DeepSeek-AIDeepSeek
203Tencenthunyuan-large-2025-02-10Tencent1367.00+/-25519TencentProprietary
213Alibabaqwen2.5-plus-1127Alibaba1357.00+/-141,553AlibabaProprietary
215Tencenthunyuan-large-visionTencent1356.00+/-19963TencentProprietary
219StepFunstep-1o-turbo-202506StepFun1354.00+/-151,506StepFunProprietary
220Alibabaqwen-max-0919Alibaba1353.00+/-112,756AlibabaQwen
221glm-4-plusZhipu AI1353.00+/-94,449Zhipu AIProprietary
232DeepSeekdeepseek-coder-v2DeepSeek1342.00+/-122,671DeepSeekDeepSeek License
238Tencenthunyuan-standard-2025-02-10Tencent1332.00+/-24549TencentProprietary
240glm-4-plus-0111Zhipu1331.00+/-18894ZhipuProprietary
261Tencenthunyuan-standard-256kTencent1301.00+/-25497TencentProprietary
287Alibabaqwen1.5-32b-chatAlibaba1261.00+/-113,930AlibabaQianwen LICENSE
307DeepSeek-AIDeepSeek LLM 67B ChatDeepSeek-AI1217.00+/-24649DeepSeek-AIDeepSeek License

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.

FAQ

01

What is LMArena Coding Arena?

LMArena Coding Arena is an anonymous evaluation track focused on coding ability. Users submit real programming tasks such as debugging, code generation, and algorithm implementation; two hidden model answers are shown side by side, and user votes are aggregated into an Elo leaderboard.

02

How is Coding Arena different from SWE-bench or HumanEval?

Static benchmarks use fixed test sets and automated scoring, which makes them reproducible but easier to over-optimize for. Coding Arena uses open-ended user tasks and human preference votes, so it better reflects practical coding experience. The two approaches are complementary.

03

How do China-developed models perform on coding tasks?

Models such as DeepSeek and Qwen rank competitively on coding leaderboards. They are especially relevant when open deployment, Chinese-language developer workflows, or cost control matter.

04

How can AI help with day-to-day programming?

Common workflows include code completion and generation, debugging, code review, unit test generation, and cross-language translation.