LMArena Coding Arena Leaderboard
The latest AI coding model leaderboard based on LMArena Coding Arena anonymous user voting. Covers Elo scores, confidence intervals, and vote counts for Claude, GPT, Gemini, DeepSeek, Qwen, and more.
Top Model
Kimi K2.6
Top Score
1510.00
Model Count
362
Data version
2026年06月16日
Data source: LM Arena
About This Leaderboard
This leaderboard ranks AI models by coding ability. Data comes from LMArena (formerly LMSYS Chatbot Arena)'s Coding sub-track, evaluated through anonymous blind testing by real users on programming tasks.
Methodology Overview
Blind testing: Users submit coding questions, two anonymous models generate code answers, and users vote for the better response — eliminating brand bias.
Elo scoring: Uses the Bradley-Terry model to calculate Elo scores. Higher scores mean users more frequently prefer that model's code solutions.
Broad scenario coverage: Testing spans code generation, bug fixing, algorithm implementation, code explanation, and more real-world programming scenarios.
DataLearner provides in-depth analysis on top of the raw data, linking leaderboard models to the DataLearner model database so you can quickly access model details, API pricing, benchmark scores, and more.
Ranking Table
| Rank | Model | Score | 95% CI | Votes | Organization | License |
|---|---|---|---|---|---|---|
| 31 | Kimi K2.6Moonshot AI | 1510.00 | +/-8 | 6,984 | Moonshot AI | Modified MIT |
| 37 | minimax-m3MiniMax | 1505.00 | +/-11 | 3,174 | MiniMax | Proprietary |
| 38 | Kimi K2.5 InstantMoonshot AI | 1505.00 | +/-14 | 1,800 | Moonshot AI | Modified MIT |
| 41 | Kimi K2 ThinkingMoonshot AI | 1503.00 | +/-6 | 12,764 | Moonshot AI | Modified MIT |
| 42 | DeepSeek-V4-ProDeepSeek-AI | 1502.00 | +/-8 | 8,475 | DeepSeek-AI | MIT |
| 51 | DeepSeek-V4-Pro (thinking)DeepSeek-AI | 1494.00 | +/-8 | 7,788 | DeepSeek-AI | MIT |
| 62 | Kimi K2 Thinking (thinking-turbo)Moonshot AI | 1487.00 | +/-6 | 14,857 | Moonshot AI | Modified MIT |
| 66 | DeepSeek-V4-FlashDeepSeek-AI | 1480.00 | +/-8 | 8,206 | DeepSeek-AI | MIT |
| 68 | 1479.00 | +/-7 | 10,023 | MiniMaxAI | Modified MIT | |
| 72 | DeepSeek-V4-Flash (thinking)DeepSeek-AI | 1477.00 | +/-8 | 8,128 | DeepSeek-AI | MIT |
| 73 | DeepSeek V3.2 (thinking)DeepSeek-AI | 1476.00 | +/-7 | 8,533 | DeepSeek-AI | MIT |
| 74 | DeepSeek V3.2-Exp (thinking)DeepSeek-AI | 1475.00 | +/-13 | 1,919 | DeepSeek-AI | MIT |
| 75 | qwen3-max-2025-09-23Alibaba | 1475.00 | +/-13 | 2,040 | Alibaba | Proprietary |
| 81 | DeepSeek V3.2DeepSeek-AI | 1470.00 | +/-6 | 10,631 | DeepSeek-AI | MIT |
| 83 | Kimi K2 0905Moonshot AI | 1468.00 | +/-13 | 2,241 | Moonshot AI | Modified MIT |
| 87 | DeepSeek V3.2-ExpDeepSeek-AI | 1466.00 | +/-12 | 2,499 | DeepSeek-AI | MIT |
| 90 | DeepSeek-R1-0528DeepSeek-AI | 1465.00 | +/-11 | 2,729 | DeepSeek-AI | MIT |
| 92 | DeepSeek-V3.1 Terminus (thinking)DeepSeek-AI | 1464.00 | +/-24 | 636 | DeepSeek-AI | MIT |
| 95 | Kimi K2Moonshot AI | 1460.00 | +/-8 | 5,244 | Moonshot AI | Modified MIT |
| 97 | hunyuan-hy3-previewTencent | 1459.00 | +/-14 | 1,974 | Tencent | tencent-hunyuan-community |
| 105 | DeepSeek-V3.1 (thinking)DeepSeek-AI | 1457.00 | +/-13 | 1,903 | DeepSeek-AI | MIT |
| 112 | Step 3.5 FlashStepFunAI | 1449.00 | +/-6 | 11,088 | StepFunAI | Apache 2.0 |
| 114 | DeepSeek-V3.1DeepSeek-AI | 1447.00 | +/-12 | 2,622 | DeepSeek-AI | MIT |
| 115 | qwen3-235b-a22b-no-thinkingAlibaba | 1447.00 | +/-8 | 6,973 | Alibaba | Apache 2.0 |
| 118 | 1445.00 | +/-7 | 10,915 | MiniMaxAI | Modified MIT | |
| 119 | DeepSeek-R1DeepSeek-AI | 1445.00 | +/-12 | 2,317 | DeepSeek-AI | MIT |
| 121 | qwen3-235b-a22b-thinking-2507Alibaba | 1442.00 | +/-15 | 1,610 | Alibaba | Apache 2.0 |
| 124 | 1439.00 | +/-10 | 3,427 | MiniMaxAI | MIT | |
| 125 | DeepSeek-V3.1 TerminusDeepSeek-AI | 1439.00 | +/-21 | 778 | DeepSeek-AI | MIT |
| 126 | hunyuan-vision-1.5-thinkingTencent | 1438.00 | +/-27 | 435 | Tencent | Proprietary |
| 139 | Step 3.5 FlashStepFunAI | 1432.00 | +/-7 | 11,611 | StepFunAI | Proprietary |
| 143 | DeepSeek-V3-0324DeepSeek-AI | 1429.00 | +/-7 | 8,367 | DeepSeek-AI | MIT |
| 152 | minimax-m1MiniMax | 1416.00 | +/-8 | 6,486 | MiniMax | Apache 2.0 |
| 160 | Step3StepFunAI | 1408.00 | +/-17 | 1,233 | StepFunAI | Apache 2.0 |
| 165 | hunyuan-turbos-20250226Tencent | 1400.00 | +/-31 | 275 | Tencent | Proprietary |
| 171 | hunyuan-turbos-20250416Tencent | 1395.00 | +/-14 | 1,776 | Tencent | Proprietary |
| 178 | DeepSeek-V3DeepSeek-AI | 1388.00 | +/-10 | 3,280 | DeepSeek-AI | DeepSeek |
| 185 | 1385.00 | +/-15 | 1,544 | MiniMaxAI | Apache 2.0 | |
| 189 | qwen-plus-0125Alibaba | 1380.00 | +/-18 | 893 | Alibaba | Proprietary |
| 191 | deepseek-v2.5-1210DeepSeek | 1375.00 | +/-17 | 1,079 | DeepSeek | DeepSeek |
| 194 | hunyuan-turbo-0110Tencent | 1372.00 | +/-30 | 299 | Tencent | Proprietary |
| 195 | step-2-16k-exp-202412StepFun | 1372.00 | +/-20 | 737 | StepFun | Proprietary |
| 200 | DeepSeek V2.5DeepSeek-AI | 1369.00 | +/-9 | 4,252 | DeepSeek-AI | DeepSeek |
| 203 | hunyuan-large-2025-02-10Tencent | 1367.00 | +/-25 | 519 | Tencent | Proprietary |
| 213 | qwen2.5-plus-1127Alibaba | 1357.00 | +/-14 | 1,553 | Alibaba | Proprietary |
| 215 | hunyuan-large-visionTencent | 1356.00 | +/-19 | 963 | Tencent | Proprietary |
| 219 | step-1o-turbo-202506StepFun | 1354.00 | +/-15 | 1,506 | StepFun | Proprietary |
| 220 | qwen-max-0919Alibaba | 1353.00 | +/-11 | 2,756 | Alibaba | Qwen |
| 221 | glm-4-plusZhipu AI | 1353.00 | +/-9 | 4,449 | Zhipu AI | Proprietary |
| 232 | deepseek-coder-v2DeepSeek | 1342.00 | +/-12 | 2,671 | DeepSeek | DeepSeek License |
| 238 | hunyuan-standard-2025-02-10Tencent | 1332.00 | +/-24 | 549 | Tencent | Proprietary |
| 240 | glm-4-plus-0111Zhipu | 1331.00 | +/-18 | 894 | Zhipu | Proprietary |
| 261 | hunyuan-standard-256kTencent | 1301.00 | +/-25 | 497 | Tencent | Proprietary |
| 287 | qwen1.5-32b-chatAlibaba | 1261.00 | +/-11 | 3,930 | Alibaba | Qianwen LICENSE |
| 307 | DeepSeek LLM 67B ChatDeepSeek-AI | 1217.00 | +/-24 | 649 | DeepSeek-AI | DeepSeek License |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.
FAQ
What is LMArena Coding Arena?
LMArena Coding Arena is an anonymous evaluation track focused on coding ability. Users submit real programming tasks such as debugging, code generation, and algorithm implementation; two hidden model answers are shown side by side, and user votes are aggregated into an Elo leaderboard.
How is Coding Arena different from SWE-bench or HumanEval?
Static benchmarks use fixed test sets and automated scoring, which makes them reproducible but easier to over-optimize for. Coding Arena uses open-ended user tasks and human preference votes, so it better reflects practical coding experience. The two approaches are complementary.
How do China-developed models perform on coding tasks?
Models such as DeepSeek and Qwen rank competitively on coding leaderboards. They are especially relevant when open deployment, Chinese-language developer workflows, or cost control matter.
How can AI help with day-to-day programming?
Common workflows include code completion and generation, debugging, code review, unit test generation, and cross-language translation.





