LMArena Coding Arena Leaderboard

Name: LMArena Coding Arena Leaderboard
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

The latest AI coding model leaderboard based on LMArena Coding Arena anonymous user voting. Covers Elo scores, confidence intervals, and vote counts for Claude, GPT, Gemini, DeepSeek, Qwen, and more.

Top Model

Kimi K2.6

Top Score

1510.00

Model Count

362

Data version

2026年06月16日

Data source: LM Arena

About This Leaderboard

This leaderboard ranks AI models by coding ability. Data comes from LMArena (formerly LMSYS Chatbot Arena)'s Coding sub-track, evaluated through anonymous blind testing by real users on programming tasks.

Methodology Overview

Blind testing: Users submit coding questions, two anonymous models generate code answers, and users vote for the better response — eliminating brand bias.

Elo scoring: Uses the Bradley-Terry model to calculate Elo scores. Higher scores mean users more frequently prefer that model's code solutions.

Broad scenario coverage: Testing spans code generation, bug fixing, algorithm implementation, code explanation, and more real-world programming scenarios.

DataLearner provides in-depth analysis on top of the raw data, linking leaderboard models to the DataLearner model database so you can quickly access model details, API pricing, benchmark scores, and more.

Origin:All China

Leaderboard snapshot month:

Ranking Table

Rank	Model	Score	95% CI	Votes	Organization	License
31	Kimi K2.6Moonshot AI	1510.00	+/-8	6,984	Moonshot AI	Modified MIT
37	minimax-m3MiniMax	1505.00	+/-11	3,174	MiniMax	Proprietary
38	Kimi K2.5 InstantMoonshot AI	1505.00	+/-14	1,800	Moonshot AI	Modified MIT
41	Kimi K2 ThinkingMoonshot AI	1503.00	+/-6	12,764	Moonshot AI	Modified MIT
42	DeepSeek-V4-ProDeepSeek-AI	1502.00	+/-8	8,475	DeepSeek-AI	MIT
51	DeepSeek-V4-Pro (thinking)DeepSeek-AI	1494.00	+/-8	7,788	DeepSeek-AI	MIT
62	Kimi K2 Thinking (thinking-turbo)Moonshot AI	1487.00	+/-6	14,857	Moonshot AI	Modified MIT
66	DeepSeek-V4-FlashDeepSeek-AI	1480.00	+/-8	8,206	DeepSeek-AI	MIT
68	MiniMax-M2.7MiniMaxAI	1479.00	+/-7	10,023	MiniMaxAI	Modified MIT
72	DeepSeek-V4-Flash (thinking)DeepSeek-AI	1477.00	+/-8	8,128	DeepSeek-AI	MIT
73	DeepSeek V3.2 (thinking)DeepSeek-AI	1476.00	+/-7	8,533	DeepSeek-AI	MIT
74	DeepSeek V3.2-Exp (thinking)DeepSeek-AI	1475.00	+/-13	1,919	DeepSeek-AI	MIT
75	qwen3-max-2025-09-23Alibaba	1475.00	+/-13	2,040	Alibaba	Proprietary
81	DeepSeek V3.2DeepSeek-AI	1470.00	+/-6	10,631	DeepSeek-AI	MIT
83	Kimi K2 0905Moonshot AI	1468.00	+/-13	2,241	Moonshot AI	Modified MIT
87	DeepSeek V3.2-ExpDeepSeek-AI	1466.00	+/-12	2,499	DeepSeek-AI	MIT
90	DeepSeek-R1-0528DeepSeek-AI	1465.00	+/-11	2,729	DeepSeek-AI	MIT
92	DeepSeek-V3.1 Terminus (thinking)DeepSeek-AI	1464.00	+/-24	636	DeepSeek-AI	MIT
95	Kimi K2Moonshot AI	1460.00	+/-8	5,244	Moonshot AI	Modified MIT
97	hunyuan-hy3-previewTencent	1459.00	+/-14	1,974	Tencent	tencent-hunyuan-community
105	DeepSeek-V3.1 (thinking)DeepSeek-AI	1457.00	+/-13	1,903	DeepSeek-AI	MIT
112	Step 3.5 FlashStepFunAI	1449.00	+/-6	11,088	StepFunAI	Apache 2.0
114	DeepSeek-V3.1DeepSeek-AI	1447.00	+/-12	2,622	DeepSeek-AI	MIT
115	qwen3-235b-a22b-no-thinkingAlibaba	1447.00	+/-8	6,973	Alibaba	Apache 2.0
118	MiniMax M2.5MiniMaxAI	1445.00	+/-7	10,915	MiniMaxAI	Modified MIT
119	DeepSeek-R1DeepSeek-AI	1445.00	+/-12	2,317	DeepSeek-AI	MIT
121	qwen3-235b-a22b-thinking-2507Alibaba	1442.00	+/-15	1,610	Alibaba	Apache 2.0
124	M2.1MiniMaxAI	1439.00	+/-10	3,427	MiniMaxAI	MIT
125	DeepSeek-V3.1 TerminusDeepSeek-AI	1439.00	+/-21	778	DeepSeek-AI	MIT
126	hunyuan-vision-1.5-thinkingTencent	1438.00	+/-27	435	Tencent	Proprietary
139	Step 3.5 FlashStepFunAI	1432.00	+/-7	11,611	StepFunAI	Proprietary
143	DeepSeek-V3-0324DeepSeek-AI	1429.00	+/-7	8,367	DeepSeek-AI	MIT
152	minimax-m1MiniMax	1416.00	+/-8	6,486	MiniMax	Apache 2.0
160	Step3StepFunAI	1408.00	+/-17	1,233	StepFunAI	Apache 2.0
165	hunyuan-turbos-20250226Tencent	1400.00	+/-31	275	Tencent	Proprietary
171	hunyuan-turbos-20250416Tencent	1395.00	+/-14	1,776	Tencent	Proprietary
178	DeepSeek-V3DeepSeek-AI	1388.00	+/-10	3,280	DeepSeek-AI	DeepSeek
185	MiniMax M2MiniMaxAI	1385.00	+/-15	1,544	MiniMaxAI	Apache 2.0
189	qwen-plus-0125Alibaba	1380.00	+/-18	893	Alibaba	Proprietary
191	deepseek-v2.5-1210DeepSeek	1375.00	+/-17	1,079	DeepSeek	DeepSeek
194	hunyuan-turbo-0110Tencent	1372.00	+/-30	299	Tencent	Proprietary
195	step-2-16k-exp-202412StepFun	1372.00	+/-20	737	StepFun	Proprietary
200	DeepSeek V2.5DeepSeek-AI	1369.00	+/-9	4,252	DeepSeek-AI	DeepSeek
203	hunyuan-large-2025-02-10Tencent	1367.00	+/-25	519	Tencent	Proprietary
213	qwen2.5-plus-1127Alibaba	1357.00	+/-14	1,553	Alibaba	Proprietary
215	hunyuan-large-visionTencent	1356.00	+/-19	963	Tencent	Proprietary
219	step-1o-turbo-202506StepFun	1354.00	+/-15	1,506	StepFun	Proprietary
220	qwen-max-0919Alibaba	1353.00	+/-11	2,756	Alibaba	Qwen
221	glm-4-plusZhipu AI	1353.00	+/-9	4,449	Zhipu AI	Proprietary
232	deepseek-coder-v2DeepSeek	1342.00	+/-12	2,671	DeepSeek	DeepSeek License
238	hunyuan-standard-2025-02-10Tencent	1332.00	+/-24	549	Tencent	Proprietary
240	glm-4-plus-0111Zhipu	1331.00	+/-18	894	Zhipu	Proprietary
261	hunyuan-standard-256kTencent	1301.00	+/-25	497	Tencent	Proprietary
287	qwen1.5-32b-chatAlibaba	1261.00	+/-11	3,930	Alibaba	Qianwen LICENSE
307	DeepSeek LLM 67B ChatDeepSeek-AI	1217.00	+/-24	649	DeepSeek-AI	DeepSeek License

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.

FAQ

What is LMArena Coding Arena?

LMArena Coding Arena is an anonymous evaluation track focused on coding ability. Users submit real programming tasks such as debugging, code generation, and algorithm implementation; two hidden model answers are shown side by side, and user votes are aggregated into an Elo leaderboard.

How is Coding Arena different from SWE-bench or HumanEval?

Static benchmarks use fixed test sets and automated scoring, which makes them reproducible but easier to over-optimize for. Coding Arena uses open-ended user tasks and human preference votes, so it better reflects practical coding experience. The two approaches are complementary.

How do China-developed models perform on coding tasks?

Models such as DeepSeek and Qwen rank competitively on coding leaderboards. They are especially relevant when open deployment, Chinese-language developer workflows, or cost control matter.

How can AI help with day-to-day programming?

Common workflows include code completion and generation, debugging, code review, unit test generation, and cross-language translation.