Text Generation Arena Leaderboard

The latest AI text generation leaderboard based on LMArena anonymous user voting. Covers Elo scores, confidence intervals, and vote counts for leading language models.

Top Model

Kimi K2.6

Top Score

1,460

Model Count

367

Data version

2026年06月16日

Data source: LM Arena

About This Leaderboard

This leaderboard ranks the strongest AI models for text generation. Data comes from LMArena (formerly LMSYS Chatbot Arena), the world's largest crowdsourced AI evaluation platform. Users chat with two anonymous models side-by-side and vote for the better response — rankings are determined entirely by real user preferences, not lab benchmarks.

Methodology Overview

Blind testing: Users chat with two anonymous models and vote based on response quality, eliminating brand bias.

Elo scoring: Using the Bradley-Terry model (adapted from chess Elo ratings) to calculate each model's strength score from battle outcomes. Higher scores mean users more frequently prefer that model.

Broad scenario coverage: Testing spans coding, creative writing, math reasoning, Q&A, role-playing, and more.

DataLearner provides in-depth analysis on top of the raw data, linking leaderboard models to the DataLearner model database so you can quickly access model details, API pricing, benchmark scores, and more.

Origin:AllChina
Leaderboard snapshot month:

Ranking Table

RankModelScore95% CIVotesOrganizationLicense
34Moonshot AIKimi K2.6Moonshot AI1,460+/-525,456Moonshot AIModified MIT
36DeepSeek-AIDeepSeek-V4-Pro (thinking)DeepSeek-AI1,458+/-526,928DeepSeek-AIMIT
38DeepSeek-AIDeepSeek-V4-ProDeepSeek-AI1,456+/-528,720DeepSeek-AIMIT
44Moonshot AIKimi K2 ThinkingMoonshot AI1,450+/-447,780Moonshot AIModified MIT
49MiniMaxminimax-m3MiniMax1,448+/-711,264MiniMaxProprietary
63DeepSeek-AIDeepSeek-V4-Flash (thinking)DeepSeek-AI1,436+/-528,215DeepSeek-AIMIT
67DeepSeek-AIDeepSeek-V4-FlashDeepSeek-AI1,434+/-528,291DeepSeek-AIMIT
72Moonshot AIKimi K2.5 InstantMoonshot AI1,431+/-78,177Moonshot AIModified MIT
75Moonshot AIKimi K2 Thinking (thinking-turbo)Moonshot AI1,430+/-362,098Moonshot AIModified MIT
80DeepSeek-AIDeepSeek V3.2DeepSeek-AI1,425+/-447,303DeepSeek-AIMIT
81DeepSeek-AIDeepSeek V3.2-Exp (thinking)DeepSeek-AI1,425+/-79,069DeepSeek-AIMIT
83Alibabaqwen3-max-2025-09-23Alibaba1,424+/-69,151AlibabaProprietary
85DeepSeek-AIDeepSeek V3.2-ExpDeepSeek-AI1,423+/-611,922DeepSeek-AIMIT
86DeepSeek-AIDeepSeek V3.2 (thinking)DeepSeek-AI1,423+/-441,085DeepSeek-AIMIT
87DeepSeek-AIDeepSeek-R1-0528DeepSeek-AI1,422+/-618,463DeepSeek-AIMIT
90Moonshot AIKimi K2 0905Moonshot AI1,418+/-711,780Moonshot AIModified MIT
91DeepSeek-AIDeepSeek-V3.1 Terminus (thinking)DeepSeek-AI1,418+/-103,462DeepSeek-AIMIT
92Moonshot AIKimi K2Moonshot AI1,417+/-527,637Moonshot AIModified MIT
93DeepSeek-AIDeepSeek-V3.1DeepSeek-AI1,417+/-614,958DeepSeek-AIMIT
95DeepSeek-AIDeepSeek-V3.1 (thinking)DeepSeek-AI1,417+/-711,737DeepSeek-AIMIT
96MiniMaxAIMiniMax-M2.7MiniMaxAI1,417+/-434,620MiniMaxAIModified MIT
98DeepSeek-AIDeepSeek-V3.1 TerminusDeepSeek-AI1,416+/-103,702DeepSeek-AIMIT
103Tencenthunyuan-hy3-previewTencent1,413+/-86,678Tencenttencent-hunyuan-community
114Alibabaqwen3-235b-a22b-no-thinkingAlibaba1,403+/-538,208AlibabaApache 2.0
119Alibabaqwen3-235b-a22b-thinking-2507Alibaba1,399+/-78,994AlibabaApache 2.0
121DeepSeek-AIDeepSeek-R1DeepSeek-AI1,398+/-518,524DeepSeek-AIMIT
122StepFunAIStep 3.5 FlashStepFunAI1,397+/-440,958StepFunAIProprietary
123Tencenthunyuan-vision-1.5-thinkingTencent1,396+/-122,216TencentProprietary
126DeepSeek-AIDeepSeek-V3-0324DeepSeek-AI1,396+/-445,505DeepSeek-AIMIT
127StepFunAIStep 3.5 FlashStepFunAI1,395+/-444,826StepFunAIApache 2.0
130MiniMaxAIMiniMax M2.5MiniMaxAI1,391+/-441,271MiniMaxAIModified MIT
140MiniMaxAIM2.1MiniMaxAI1,384+/-517,128MiniMaxAIMIT
143Tencenthunyuan-turbos-20250416Tencent1,382+/-610,722TencentProprietary
158MiniMaxminimax-m1MiniMax1,364+/-435,208MiniMaxApache 2.0
163DeepSeek-AIDeepSeek-V3DeepSeek-AI1,358+/-521,770DeepSeek-AIDeepSeek
173Tencenthunyuan-turbos-20250226Tencent1,349+/-122,220TencentProprietary
174StepFunAIStep3StepFunAI1,348+/-76,541StepFunAIApache 2.0
180Alibabaqwen-plus-0125Alibaba1,346+/-85,819AlibabaProprietary
182MiniMaxAIMiniMax M2MiniMaxAI1,346+/-86,868MiniMaxAIApache 2.0
185glm-4-plus-0111Zhipu1,343+/-85,760ZhipuProprietary
188Tencenthunyuan-turbo-0110Tencent1,341+/-122,290TencentProprietary
197StepFunstep-2-16k-exp-202412StepFun1,334+/-94,833StepFunProprietary
205Tencenthunyuan-large-2025-02-10Tencent1,326+/-103,738TencentProprietary
209DeepSeekdeepseek-v2.5-1210DeepSeek1,323+/-86,795DeepSeekDeepSeek
214StepFunstep-1o-turbo-202506StepFun1,320+/-79,041StepFunProprietary
215glm-4-plusZhipu AI1,319+/-526,126Zhipu AIProprietary
218Alibabaqwen-max-0919Alibaba1,318+/-616,478AlibabaQwen
222Alibabaqwen2.5-plus-1127Alibaba1,315+/-610,187AlibabaProprietary
227Tencenthunyuan-standard-2025-02-10Tencent1,311+/-103,904TencentProprietary
230DeepSeek-AIDeepSeek V2.5DeepSeek-AI1,307+/-524,572DeepSeek-AIDeepSeek
241Tencenthunyuan-large-visionTencent1,294+/-95,372TencentProprietary
265DeepSeekdeepseek-coder-v2DeepSeek1,264+/-615,147DeepSeekDeepSeek License
280Tencenthunyuan-standard-256kTencent1,233+/-122,728TencentProprietary
296Alibabaqwen1.5-32b-chatAlibaba1,203+/-621,741AlibabaQianwen LICENSE
305DeepSeek-AIDeepSeek LLM 67B ChatDeepSeek-AI1,184+/-114,932DeepSeek-AIDeepSeek License

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.

FAQ

01

What is Text Generation Arena (LMArena)?

Text Generation Arena, formerly LMSYS Chatbot Arena, is one of the most widely followed anonymous LLM evaluation platforms. Users compare answers from two hidden models and vote for the better response; Elo-style scoring aggregates those votes into a dynamic leaderboard.

02

How is the Arena Elo score calculated?

Arena Elo is adapted from chess rating systems. After each head-to-head comparison, the preferred model gains rating points and the other model loses points, with the size of the change depending on the rating gap. The 95% confidence interval reflects how much comparison data supports the estimate.

03

Why do some models have both Thinking and regular versions?

Some models offer an extended-thinking mode that spends more inference time reasoning before producing the final answer. This can improve scores on reasoning, math, and coding tasks, but usually increases latency and cost, so Arena tracks these variants separately.

04

How should I choose an LLM from this leaderboard?

Consider overall Elo, cost, language coverage, open-source availability, and latency. The top-ranked model is not always the best fit for every workflow.