DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
HomeOverall LeaderboardText Generation Arena Leaderboard

LMArena Tracks

Text GenerationCodingMathImage EditText-to-VideoImage-to-VideoText-to-Image

Text Generation Arena Leaderboard

The latest AI text generation leaderboard based on LMArena anonymous user voting. Covers Elo scores, confidence intervals, and vote counts for leading language models.

Top Model

ernie-5.1

Top Score

1,474

Model Count

357

Data version

2026年05月07日

Data source: LM Arena

About This Leaderboard

This leaderboard ranks the strongest AI models for text generation. Data comes from LMArena (formerly LMSYS Chatbot Arena), the world's largest crowdsourced AI evaluation platform. Users chat with two anonymous models side-by-side and vote for the better response — rankings are determined entirely by real user preferences, not lab benchmarks.

Methodology Overview

Blind testing: Users chat with two anonymous models and vote based on response quality, eliminating brand bias.

Elo scoring: Using the Bradley-Terry model (adapted from chess Elo ratings) to calculate each model's strength score from battle outcomes. Higher scores mean users more frequently prefer that model.

Broad scenario coverage: Testing spans coding, creative writing, math reasoning, Q&A, role-playing, and more.

DataLearner provides in-depth analysis on top of the raw data, linking leaderboard models to the DataLearner model database so you can quickly access model details, API pricing, benchmark scores, and more.

Origin:AllChina
Leaderboard snapshot month:

Ranking Table

RankModelScore95% CIVotesOrganizationLicense
14Baiduernie-5.1Baidu1,474+/-85,733BaiduProprietary
25Alibabaqwen3.5-max-previewAlibaba1,464+/-514,558AlibabaProprietary
27DeepSeek-AIDeepSeek-V4-ProDeepSeek-AI1,463+/-94,160DeepSeek-AIMIT
28Moonshot AIKimi K2.6Moonshot AI1,462+/-77,108Moonshot AIModified MIT
29DeepSeekdeepseek-v4-pro-thinkingDeepSeek1,462+/-93,808DeepSeekMIT
31Bytedancedola-seed-2.0-proBytedance1,459+/-526,587BytedanceProprietary
41Moonshot AIKimi K2 ThinkingMoonshot AI1,449+/-427,282Moonshot AIModified MIT
53DeepSeekdeepseek-v4-flash-thinkingDeepSeek1,440+/-93,600DeepSeekMIT
62DeepSeek-AIDeepSeek-V4-FlashDeepSeek-AI1,433+/-93,506DeepSeek-AIMIT
63Moonshotkimi-k2.5-instantMoonshot1,432+/-78,207MoonshotModified MIT
66Moonshot AIKimi K2 Thinking (thinking-turbo)Moonshot AI1,430+/-352,935Moonshot AIModified MIT
70DeepSeek-AIDeepSeek V3.2-Exp (thinking)DeepSeek-AI1,425+/-79,076DeepSeek-AIMIT
71DeepSeek-AIDeepSeek V3.2DeepSeek-AI1,424+/-444,820DeepSeek-AIMIT
72Alibabaqwen3-max-2025-09-23Alibaba1,424+/-69,179AlibabaProprietary
74DeepSeek-AIDeepSeek V3.2-ExpDeepSeek-AI1,423+/-611,943DeepSeek-AIMIT
77DeepSeek-AIDeepSeek V3.2 (thinking)DeepSeek-AI1,422+/-439,071DeepSeek-AIMIT
78DeepSeek-AIDeepSeek-R1-0528DeepSeek-AI1,422+/-618,469DeepSeek-AIMIT
82Tencenthunyuan-hy3-previewTencent1,418+/-84,582Tencenttencent-hunyuan-community
83Moonshot AIKimi K2 0905Moonshot AI1,418+/-611,798Moonshot AIModified MIT
84DeepSeek-AIDeepSeek-V3.1DeepSeek-AI1,418+/-614,985DeepSeek-AIMIT
85Moonshot AIKimi K2Moonshot AI1,417+/-527,644Moonshot AIModified MIT
86DeepSeekdeepseek-v3.1-terminus-thinkingDeepSeek1,417+/-103,474DeepSeekMIT
87DeepSeek-AIDeepSeek-V3.1 (thinking)DeepSeek-AI1,417+/-711,754DeepSeek-AIMIT
88DeepSeek-AIDeepSeek-V3.1 TerminusDeepSeek-AI1,416+/-103,713DeepSeek-AIMIT
100MiniMaxAIMiniMax-M2.7MiniMaxAI1,407+/-613,525MiniMaxAIModified MIT
105Alibabaqwen3-235b-a22b-no-thinkingAlibaba1,403+/-538,241AlibabaApache 2.0
109Alibabaqwen3-235b-a22b-thinking-2507Alibaba1,399+/-79,004AlibabaApache 2.0
111StepFunAIStep 3.5 FlashStepFunAI1,398+/-519,649StepFunAIProprietary
112DeepSeek-AIDeepSeek-R1DeepSeek-AI1,398+/-518,524DeepSeek-AIMIT
114Tencenthunyuan-vision-1.5-thinkingTencent1,396+/-122,221TencentProprietary
117DeepSeek-AIDeepSeek-V3-0324DeepSeek-AI1,395+/-445,533DeepSeek-AIMIT
118MiniMaxAIMiniMax M2.5MiniMaxAI1,395+/-424,885MiniMaxAIModified MIT
119StepFunAIStep 3.5 FlashStepFunAI1,393+/-425,112StepFunAIApache 2.0
131MiniMaxAIM2.1MiniMaxAI1,385+/-517,165MiniMaxAIMIT
134Tencenthunyuan-turbos-20250416Tencent1,382+/-610,723TencentProprietary
149MiniMaxminimax-m1MiniMax1,363+/-435,233MiniMaxApache 2.0
154DeepSeek-AIDeepSeek-V3DeepSeek-AI1,358+/-521,770DeepSeek-AIDeepSeek
164Tencenthunyuan-turbos-20250226Tencent1,348+/-122,220TencentProprietary
165StepFunAIStep3StepFunAI1,348+/-76,551StepFunAIApache 2.0
172MiniMaxAIMiniMax M2MiniMaxAI1,346+/-86,871MiniMaxAIApache 2.0
173Alibabaqwen-plus-0125Alibaba1,346+/-85,819AlibabaProprietary
176ZHglm-4-plus-0111Zhipu1,343+/-85,760ZhipuProprietary
179Tencenthunyuan-turbo-0110Tencent1,340+/-122,290TencentProprietary
188StepFunstep-2-16k-exp-202412StepFun1,334+/-94,833StepFunProprietary
196Tencenthunyuan-large-2025-02-10Tencent1,326+/-103,738TencentProprietary
198DeepSeekdeepseek-v2.5-1210DeepSeek1,323+/-86,795DeepSeekDeepSeek
205StepFunstep-1o-turbo-202506StepFun1,320+/-79,039StepFunProprietary
206ZHglm-4-plusZhipu AI1,319+/-526,126Zhipu AIProprietary
209Alibabaqwen-max-0919Alibaba1,318+/-616,478AlibabaQwen
213Alibabaqwen2.5-plus-1127Alibaba1,315+/-610,187AlibabaProprietary
218Tencenthunyuan-standard-2025-02-10Tencent1,311+/-103,904TencentProprietary
221DeepSeekdeepseek-v2.5DeepSeek1,307+/-524,572DeepSeekDeepSeek
229Alibabaqwen2.5-72b-instructAlibaba1,302+/-439,406AlibabaQwen
231Tencenthunyuan-large-visionTencent1,294+/-95,370TencentProprietary
250ZHglm-4-0520Zhipu AI1,273+/-79,788Zhipu AIProprietary
252Alibabaqwen2.5-coder-32b-instructAlibaba1,270+/-85,432AlibabaApache 2.0
255DeepSeekdeepseek-coder-v2DeepSeek1,264+/-615,147DeepSeekDeepSeek License
257Alibabaqwen2-72b-instructAlibaba1,261+/-537,325AlibabaQianwen LICENSE
269Alibabaqwen1.5-110b-chatAlibaba1,233+/-626,195AlibabaQianwen LICENSE
270Tencenthunyuan-standard-256kTencent1,233+/-122,728TencentProprietary
272Alibabaqwen1.5-72b-chatAlibaba1,232+/-539,302AlibabaQianwen LICENSE
286Alibabaqwen1.5-32b-chatAlibaba1,203+/-621,741AlibabaQianwen LICENSE
292INinternlm2_5-20b-chatInternLM1,191+/-79,901InternLMOther
293Alibabaqwen1.5-14b-chatAlibaba1,190+/-717,839AlibabaQianwen LICENSE
295DeepSeekdeepseek-llm-67b-chatDeepSeek1,183+/-124,932DeepSeekDeepSeek License
312Alibabaqwq-32b-previewAlibaba1,156+/-123,231AlibabaApache 2.0
321Alibabaqwen1.5-7b-chatAlibaba1,143+/-104,737AlibabaQianwen LICENSE
325Alibabaqwen-14b-chatAlibaba1,137+/-114,964AlibabaQianwen LICENSE
343Alibabaqwen1.5-4b-chatAlibaba1,089+/-97,597AlibabaQianwen LICENSE

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.

FAQ

01

What is Text Generation Arena (LMArena)?

Text Generation Arena, formerly LMSYS Chatbot Arena, is one of the most widely followed anonymous LLM evaluation platforms. Users compare answers from two hidden models and vote for the better response; Elo-style scoring aggregates those votes into a dynamic leaderboard.

02

How is the Arena Elo score calculated?

Arena Elo is adapted from chess rating systems. After each head-to-head comparison, the preferred model gains rating points and the other model loses points, with the size of the change depending on the rating gap. The 95% confidence interval reflects how much comparison data supports the estimate.

03

Why do some models have both Thinking and regular versions?

Some models offer an extended-thinking mode that spends more inference time reasoning before producing the final answer. This can improve scores on reasoning, math, and coding tasks, but usually increases latency and cost, so Arena tracks these variants separately.

04

How should I choose an LLM from this leaderboard?

Consider overall Elo, cost, language coverage, open-source availability, and latency. The top-ranked model is not always the best fit for every workflow.