LMArena Tracks

Text Generation Coding Math Image Edit Text-to-Video Image-to-Video Text-to-Image

Text Generation Arena Leaderboard

Name: Text Generation Arena Leaderboard
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

The latest AI text generation leaderboard based on LMArena anonymous user voting. Covers Elo scores, confidence intervals, and vote counts for leading language models.

Top Model

ernie-5.1

Top Score

1,474

Model Count

357

Data version

2026年05月07日

Data source: LM Arena

About This Leaderboard

This leaderboard ranks the strongest AI models for text generation. Data comes from LMArena (formerly LMSYS Chatbot Arena), the world's largest crowdsourced AI evaluation platform. Users chat with two anonymous models side-by-side and vote for the better response — rankings are determined entirely by real user preferences, not lab benchmarks.

Methodology Overview

Blind testing: Users chat with two anonymous models and vote based on response quality, eliminating brand bias.

Elo scoring: Using the Bradley-Terry model (adapted from chess Elo ratings) to calculate each model's strength score from battle outcomes. Higher scores mean users more frequently prefer that model.

Broad scenario coverage: Testing spans coding, creative writing, math reasoning, Q&A, role-playing, and more.

DataLearner provides in-depth analysis on top of the raw data, linking leaderboard models to the DataLearner model database so you can quickly access model details, API pricing, benchmark scores, and more.

Origin:All China

Leaderboard snapshot month:

Ranking Table

Rank	Model	Score	95% CI	Votes	Organization	License
14	ernie-5.1Baidu	1,474	+/-8	5,733	Baidu	Proprietary
25	qwen3.5-max-previewAlibaba	1,464	+/-5	14,558	Alibaba	Proprietary
27	DeepSeek-V4-ProDeepSeek-AI	1,463	+/-9	4,160	DeepSeek-AI	MIT
28	Kimi K2.6Moonshot AI	1,462	+/-7	7,108	Moonshot AI	Modified MIT
29	deepseek-v4-pro-thinkingDeepSeek	1,462	+/-9	3,808	DeepSeek	MIT
31	dola-seed-2.0-proBytedance	1,459	+/-5	26,587	Bytedance	Proprietary
41	Kimi K2 ThinkingMoonshot AI	1,449	+/-4	27,282	Moonshot AI	Modified MIT
53	deepseek-v4-flash-thinkingDeepSeek	1,440	+/-9	3,600	DeepSeek	MIT
62	DeepSeek-V4-FlashDeepSeek-AI	1,433	+/-9	3,506	DeepSeek-AI	MIT
63	kimi-k2.5-instantMoonshot	1,432	+/-7	8,207	Moonshot	Modified MIT
66	Kimi K2 Thinking (thinking-turbo)Moonshot AI	1,430	+/-3	52,935	Moonshot AI	Modified MIT
70	DeepSeek V3.2-Exp (thinking)DeepSeek-AI	1,425	+/-7	9,076	DeepSeek-AI	MIT
71	DeepSeek V3.2DeepSeek-AI	1,424	+/-4	44,820	DeepSeek-AI	MIT
72	qwen3-max-2025-09-23Alibaba	1,424	+/-6	9,179	Alibaba	Proprietary
74	DeepSeek V3.2-ExpDeepSeek-AI	1,423	+/-6	11,943	DeepSeek-AI	MIT
77	DeepSeek V3.2 (thinking)DeepSeek-AI	1,422	+/-4	39,071	DeepSeek-AI	MIT
78	DeepSeek-R1-0528DeepSeek-AI	1,422	+/-6	18,469	DeepSeek-AI	MIT
82	hunyuan-hy3-previewTencent	1,418	+/-8	4,582	Tencent	tencent-hunyuan-community
83	Kimi K2 0905Moonshot AI	1,418	+/-6	11,798	Moonshot AI	Modified MIT
84	DeepSeek-V3.1DeepSeek-AI	1,418	+/-6	14,985	DeepSeek-AI	MIT
85	Kimi K2Moonshot AI	1,417	+/-5	27,644	Moonshot AI	Modified MIT
86	deepseek-v3.1-terminus-thinkingDeepSeek	1,417	+/-10	3,474	DeepSeek	MIT
87	DeepSeek-V3.1 (thinking)DeepSeek-AI	1,417	+/-7	11,754	DeepSeek-AI	MIT
88	DeepSeek-V3.1 TerminusDeepSeek-AI	1,416	+/-10	3,713	DeepSeek-AI	MIT
100	MiniMax-M2.7MiniMaxAI	1,407	+/-6	13,525	MiniMaxAI	Modified MIT
105	qwen3-235b-a22b-no-thinkingAlibaba	1,403	+/-5	38,241	Alibaba	Apache 2.0
109	qwen3-235b-a22b-thinking-2507Alibaba	1,399	+/-7	9,004	Alibaba	Apache 2.0
111	Step 3.5 FlashStepFunAI	1,398	+/-5	19,649	StepFunAI	Proprietary
112	DeepSeek-R1DeepSeek-AI	1,398	+/-5	18,524	DeepSeek-AI	MIT
114	hunyuan-vision-1.5-thinkingTencent	1,396	+/-12	2,221	Tencent	Proprietary
117	DeepSeek-V3-0324DeepSeek-AI	1,395	+/-4	45,533	DeepSeek-AI	MIT
118	MiniMax M2.5MiniMaxAI	1,395	+/-4	24,885	MiniMaxAI	Modified MIT
119	Step 3.5 FlashStepFunAI	1,393	+/-4	25,112	StepFunAI	Apache 2.0
131	M2.1MiniMaxAI	1,385	+/-5	17,165	MiniMaxAI	MIT
134	hunyuan-turbos-20250416Tencent	1,382	+/-6	10,723	Tencent	Proprietary
149	minimax-m1MiniMax	1,363	+/-4	35,233	MiniMax	Apache 2.0
154	DeepSeek-V3DeepSeek-AI	1,358	+/-5	21,770	DeepSeek-AI	DeepSeek
164	hunyuan-turbos-20250226Tencent	1,348	+/-12	2,220	Tencent	Proprietary
165	Step3StepFunAI	1,348	+/-7	6,551	StepFunAI	Apache 2.0
172	MiniMax M2MiniMaxAI	1,346	+/-8	6,871	MiniMaxAI	Apache 2.0
173	qwen-plus-0125Alibaba	1,346	+/-8	5,819	Alibaba	Proprietary
176	glm-4-plus-0111Zhipu	1,343	+/-8	5,760	Zhipu	Proprietary
179	hunyuan-turbo-0110Tencent	1,340	+/-12	2,290	Tencent	Proprietary
188	step-2-16k-exp-202412StepFun	1,334	+/-9	4,833	StepFun	Proprietary
196	hunyuan-large-2025-02-10Tencent	1,326	+/-10	3,738	Tencent	Proprietary
198	deepseek-v2.5-1210DeepSeek	1,323	+/-8	6,795	DeepSeek	DeepSeek
205	step-1o-turbo-202506StepFun	1,320	+/-7	9,039	StepFun	Proprietary
206	glm-4-plusZhipu AI	1,319	+/-5	26,126	Zhipu AI	Proprietary
209	qwen-max-0919Alibaba	1,318	+/-6	16,478	Alibaba	Qwen
213	qwen2.5-plus-1127Alibaba	1,315	+/-6	10,187	Alibaba	Proprietary
218	hunyuan-standard-2025-02-10Tencent	1,311	+/-10	3,904	Tencent	Proprietary
221	deepseek-v2.5DeepSeek	1,307	+/-5	24,572	DeepSeek	DeepSeek
229	qwen2.5-72b-instructAlibaba	1,302	+/-4	39,406	Alibaba	Qwen
231	hunyuan-large-visionTencent	1,294	+/-9	5,370	Tencent	Proprietary
250	glm-4-0520Zhipu AI	1,273	+/-7	9,788	Zhipu AI	Proprietary
252	qwen2.5-coder-32b-instructAlibaba	1,270	+/-8	5,432	Alibaba	Apache 2.0
255	deepseek-coder-v2DeepSeek	1,264	+/-6	15,147	DeepSeek	DeepSeek License
257	qwen2-72b-instructAlibaba	1,261	+/-5	37,325	Alibaba	Qianwen LICENSE
269	qwen1.5-110b-chatAlibaba	1,233	+/-6	26,195	Alibaba	Qianwen LICENSE
270	hunyuan-standard-256kTencent	1,233	+/-12	2,728	Tencent	Proprietary
272	qwen1.5-72b-chatAlibaba	1,232	+/-5	39,302	Alibaba	Qianwen LICENSE
286	qwen1.5-32b-chatAlibaba	1,203	+/-6	21,741	Alibaba	Qianwen LICENSE
292	internlm2_5-20b-chatInternLM	1,191	+/-7	9,901	InternLM	Other
293	qwen1.5-14b-chatAlibaba	1,190	+/-7	17,839	Alibaba	Qianwen LICENSE
295	deepseek-llm-67b-chatDeepSeek	1,183	+/-12	4,932	DeepSeek	DeepSeek License
312	qwq-32b-previewAlibaba	1,156	+/-12	3,231	Alibaba	Apache 2.0
321	qwen1.5-7b-chatAlibaba	1,143	+/-10	4,737	Alibaba	Qianwen LICENSE
325	qwen-14b-chatAlibaba	1,137	+/-11	4,964	Alibaba	Qianwen LICENSE
343	qwen1.5-4b-chatAlibaba	1,089	+/-9	7,597	Alibaba	Qianwen LICENSE

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.

FAQ

What is Text Generation Arena (LMArena)?

Text Generation Arena, formerly LMSYS Chatbot Arena, is one of the most widely followed anonymous LLM evaluation platforms. Users compare answers from two hidden models and vote for the better response; Elo-style scoring aggregates those votes into a dynamic leaderboard.

How is the Arena Elo score calculated?

Arena Elo is adapted from chess rating systems. After each head-to-head comparison, the preferred model gains rating points and the other model loses points, with the size of the change depending on the rating gap. The 95% confidence interval reflects how much comparison data supports the estimate.

Why do some models have both Thinking and regular versions?

Some models offer an extended-thinking mode that spends more inference time reasoning before producing the final answer. This can improve scores on reasoning, math, and coding tasks, but usually increases latency and cost, so Arena tracks these variants separately.

How should I choose an LLM from this leaderboard?

Consider overall Elo, cost, language coverage, open-source availability, and latency. The top-ranked model is not always the best fit for every workflow.

About This Leaderboard

Methodology Overview

Blind testing: Users chat with two anonymous models and vote based on response quality, eliminating brand bias.

Elo scoring: Using the Bradley-Terry model (adapted from chess Elo ratings) to calculate each model's strength score from battle outcomes. Higher scores mean users more frequently prefer that model.

Broad scenario coverage: Testing spans coding, creative writing, math reasoning, Q&A, role-playing, and more.

Rank

Model

Score

95% CI

Votes

Organization

License

ernie-5.1Baidu

1,474

+/-8

5,733

Baidu

Proprietary

qwen3.5-max-previewAlibaba

1,464

+/-5

14,558

Alibaba

Proprietary

DeepSeek-V4-ProDeepSeek-AI

1,463

+/-9

4,160

DeepSeek-AI

MIT

Kimi K2.6Moonshot AI

1,462

+/-7

7,108

Moonshot AI

Modified MIT

deepseek-v4-pro-thinkingDeepSeek

1,462

+/-9

3,808

DeepSeek

MIT

dola-seed-2.0-proBytedance

1,459

+/-5

26,587

Bytedance

Proprietary

Kimi K2 ThinkingMoonshot AI

1,449

+/-4

27,282

Moonshot AI

Modified MIT

deepseek-v4-flash-thinkingDeepSeek

1,440

+/-9

3,600

DeepSeek

MIT

DeepSeek-V4-FlashDeepSeek-AI

1,433

+/-9

3,506

DeepSeek-AI

MIT

kimi-k2.5-instantMoonshot

1,432

+/-7

8,207

Moonshot

Modified MIT

Kimi K2 Thinking (thinking-turbo)Moonshot AI

1,430

+/-3

52,935

Moonshot AI

Modified MIT

DeepSeek V3.2-Exp (thinking)DeepSeek-AI

1,425

+/-7

9,076

DeepSeek-AI

MIT

DeepSeek V3.2DeepSeek-AI

1,424

+/-4

44,820

DeepSeek-AI

MIT

qwen3-max-2025-09-23Alibaba

1,424

+/-6

9,179

Alibaba

Proprietary

DeepSeek V3.2-ExpDeepSeek-AI

1,423

+/-6

11,943

DeepSeek-AI

MIT

DeepSeek V3.2 (thinking)DeepSeek-AI

1,422

+/-4

39,071

DeepSeek-AI

MIT

DeepSeek-R1-0528DeepSeek-AI

1,422

+/-6

18,469

DeepSeek-AI

MIT

hunyuan-hy3-previewTencent

1,418

+/-8

4,582

Tencent

tencent-hunyuan-community

Kimi K2 0905Moonshot AI

1,418

+/-6

11,798

Moonshot AI

Modified MIT

DeepSeek-V3.1DeepSeek-AI

1,418

+/-6

14,985

DeepSeek-AI

MIT

Kimi K2Moonshot AI

1,417

+/-5

27,644

Moonshot AI

Modified MIT

deepseek-v3.1-terminus-thinkingDeepSeek

1,417

+/-10

3,474

DeepSeek

MIT

DeepSeek-V3.1 (thinking)DeepSeek-AI

1,417

+/-7

11,754

DeepSeek-AI

MIT

DeepSeek-V3.1 TerminusDeepSeek-AI

1,416

+/-10

3,713

DeepSeek-AI

MIT

100

MiniMax-M2.7MiniMaxAI

1,407

+/-6

13,525

MiniMaxAI

Modified MIT

105

qwen3-235b-a22b-no-thinkingAlibaba

1,403

+/-5

38,241

Alibaba

Apache 2.0

109

qwen3-235b-a22b-thinking-2507Alibaba

1,399

+/-7

9,004

Alibaba

Apache 2.0

111

Step 3.5 FlashStepFunAI

1,398

+/-5

19,649

StepFunAI

Proprietary

112

DeepSeek-R1DeepSeek-AI

1,398

+/-5

18,524

DeepSeek-AI

MIT

114

hunyuan-vision-1.5-thinkingTencent

1,396

+/-12

2,221

Tencent

Proprietary

117

DeepSeek-V3-0324DeepSeek-AI

1,395

+/-4

45,533

DeepSeek-AI

MIT

118

MiniMax M2.5MiniMaxAI

1,395

+/-4

24,885

MiniMaxAI

Modified MIT

119

Step 3.5 FlashStepFunAI

1,393

+/-4

25,112

StepFunAI

Apache 2.0

131

M2.1MiniMaxAI

1,385

+/-5

17,165

MiniMaxAI

MIT

134

hunyuan-turbos-20250416Tencent

1,382

+/-6

10,723

Tencent

Proprietary

149

minimax-m1MiniMax

1,363

+/-4

35,233

MiniMax

Apache 2.0

154

DeepSeek-V3DeepSeek-AI

1,358

+/-5

21,770

DeepSeek-AI

DeepSeek

164

hunyuan-turbos-20250226Tencent

1,348

+/-12

2,220

Tencent

Proprietary

165

Step3StepFunAI

1,348

+/-7

6,551

StepFunAI

Apache 2.0

172

MiniMax M2MiniMaxAI

1,346

+/-8

6,871

MiniMaxAI

Apache 2.0

173

qwen-plus-0125Alibaba

1,346

+/-8

5,819

Alibaba

Proprietary

176

glm-4-plus-0111Zhipu

1,343

+/-8

5,760

Zhipu

Proprietary

179

hunyuan-turbo-0110Tencent

1,340

+/-12

2,290

Tencent

Proprietary

188

step-2-16k-exp-202412StepFun

1,334

+/-9

4,833

StepFun

Proprietary

196

hunyuan-large-2025-02-10Tencent

1,326

+/-10

3,738

Tencent

Proprietary

198

deepseek-v2.5-1210DeepSeek

1,323

+/-8

6,795

DeepSeek

205

step-1o-turbo-202506StepFun

1,320

+/-7

9,039

StepFun

Proprietary

206

glm-4-plusZhipu AI

1,319

+/-5

26,126

Zhipu AI

Proprietary

209

qwen-max-0919Alibaba

1,318

+/-6

16,478

Alibaba

Qwen

213

qwen2.5-plus-1127Alibaba

1,315

+/-6

10,187

Alibaba

Proprietary

218

hunyuan-standard-2025-02-10Tencent

1,311

+/-10

3,904

Tencent

Proprietary

221

deepseek-v2.5DeepSeek

1,307

+/-5

24,572

DeepSeek

229

qwen2.5-72b-instructAlibaba

1,302

+/-4

39,406

Alibaba

Qwen

231

hunyuan-large-visionTencent

1,294

+/-9

5,370

Tencent

Proprietary

250

glm-4-0520Zhipu AI

1,273

+/-7

9,788

Zhipu AI

Proprietary

252

qwen2.5-coder-32b-instructAlibaba

1,270

+/-8

5,432

Alibaba

Apache 2.0

255

deepseek-coder-v2DeepSeek

1,264

+/-6

15,147

DeepSeek

DeepSeek License

257

qwen2-72b-instructAlibaba

1,261

+/-5

37,325

Alibaba

Qianwen LICENSE

269

qwen1.5-110b-chatAlibaba

1,233

+/-6

26,195

Alibaba

Qianwen LICENSE

270

hunyuan-standard-256kTencent

1,233

+/-12

2,728

Tencent

Proprietary

272

qwen1.5-72b-chatAlibaba

1,232

+/-5

39,302

Alibaba

Qianwen LICENSE

286

qwen1.5-32b-chatAlibaba

1,203

+/-6

21,741

Alibaba

Qianwen LICENSE

292

internlm2_5-20b-chatInternLM

1,191

+/-7

9,901

InternLM

Other

293

qwen1.5-14b-chatAlibaba

1,190

+/-7

17,839

Alibaba

Qianwen LICENSE

295

deepseek-llm-67b-chatDeepSeek

1,183

+/-12

4,932

DeepSeek

DeepSeek License

312

qwq-32b-previewAlibaba

1,156

+/-12

3,231

Alibaba

Apache 2.0

321

qwen1.5-7b-chatAlibaba

1,143

+/-10

4,737

Alibaba

Qianwen LICENSE

325

qwen-14b-chatAlibaba

1,137

+/-11

4,964

Alibaba

Qianwen LICENSE

343

qwen1.5-4b-chatAlibaba

1,089

+/-9

7,597

Alibaba

Qianwen LICENSE

FAQ

What is Text Generation Arena (LMArena)?

How is the Arena Elo score calculated?

Why do some models have both Thinking and regular versions?

How should I choose an LLM from this leaderboard?

Consider overall Elo, cost, language coverage, open-source availability, and latency. The top-ranked model is not always the best fit for every workflow.