加载中...
加载中...
基于 Text Generation Arena 用户匿名投票的最新AI文本生成模型排行榜,涵盖各模型的 Elo 得分、95% 置信区间、投票量、机构与许可证。
榜首模型
claude-opus-4-6-thinking
最高得分
1,507
模型数量
60
数据版本
2026年02月16日
数据来源: LM Arena
本排行榜展示了当前最强 AI 大模型在文本生成任务中的综合实力排名。数据来源于 LMArena(前身为 LMSYS Chatbot Arena),这是目前全球最大的 AI 模型众包评测平台。用户在平台上与两个匿名模型同时对话,并投票选出更好的回答——排名完全由真实用户的偏好决定,而非实验室基准测试。
匿名盲测:用户同时与两个"隐藏身份"的模型对话,根据回答质量投票,排除品牌偏见。
Elo 评分:基于国际象棋领域的 Elo Rating 体系(Bradley-Terry 模型),通过对战结果计算每个模型的实力分数。分数越高,说明模型在真实对话中被用户选中的概率越大。
场景覆盖广泛:涵盖编程、创意写作、数学推理、知识问答、角色扮演等高频真实场景。
DataLearner 在原始数据基础上提供中文解读与深度分析,并将排行榜模型关联至 DataLearner 模型库,方便您一键查看模型详情、API 定价、评测得分等完整信息。
图表来源:DataLearnerAI · 数据来源:LMArena
| 排名 | 模型名称 | 得分 | 95% CI | 投票数 | 机构 | 许可证 |
|---|---|---|---|---|---|---|
| 1 | claude-opus-4-6-thinking | 1,507 | +9 | 4,650 | Anthropic | Proprietary |
| 2 | claude-opus-4-6 | 1,504 | +8 | 5,427 | Anthropic | Proprietary |
| 3 | gemini-3-pro | 1,486 | +4 | 36,238 | Proprietary | |
| 4 | grok-4.1-thinking | 1,475 | +4 | 35,770 | xAI | Proprietary |
| 5 | gemini-3-flash | 1,473 | +5 | 26,986 | Proprietary | |
| 6 | dola-seed-2.0-preview | 1,473 | +10 | 3,154 | Bytedance | Proprietary |
| 7 | claude-opus-4-5-20251101-thinking-32k | 1,471 | +5 | 28,374 | Anthropic | Proprietary |
| 8 | claude-opus-4-5-20251101 | 1,467 | +4 | 33,214 | Anthropic | Proprietary |
| 9 | grok-4.1 | 1,463 | +4 | 39,883 | xAI | Proprietary |
| 10 | gemini-3-flash (thinking-minimal) | 1,462 | +5 | 18,355 | Proprietary | |
| 11 | gpt-5.1-high | 1,458 | +4 | 32,297 | OpenAI | Proprietary |
| 12 | glm-5 | 1,455 | +9 | 4,643 | Zai | MIT |
| 13 | ernie-5.0-0110 | 1,453 | +6 | 11,982 | Baidu | Proprietary |
| 14 | claude-sonnet-4-5-20250929-thinking-32k | 1,450 | +4 | 46,773 | Anthropic | Proprietary |
| 15 | claude-sonnet-4-5-20250929 | 1,450 | +4 | 44,565 | Anthropic | Proprietary |
| 16 | gemini-2.5-pro | 1,449 | +3 | 95,526 | Proprietary | |
| 17 | ernie-5.0-preview-1203 | 1,449 | +7 | 9,744 | Baidu | Proprietary |
| 18 | claude-opus-4-1-20250805-thinking-16k | 1,449 | +4 | 49,819 | Anthropic | Proprietary |
| 19 | kimi-k2.5-thinking | 1,448 | +7 | 9,050 | Moonshot | Modified MIT |
| 20 | claude-opus-4-1-20250805 | 1,445 | +3 | 75,773 | Anthropic | Proprietary |
| 21 | gpt-4.5-preview-2025-02-27 | 1,444 | +6 | 14,549 | OpenAI | Proprietary |
| 22 | chatgpt-4o-latest-20250326 | 1,442 | +3 | 83,193 | OpenAI | Proprietary |
| 23 | glm-4.7 | 1,441 | +6 | 11,971 | Zai | MIT |
| 24 | gpt-5.2-high | 1,438 | +6 | 17,088 | OpenAI | Proprietary |
| 25 | kimi-k2.5-instant | 1,438 | +9 | 5,007 | Moonshot | Modified MIT |
| 26 | gpt-5.2 | 1,438 | +6 | 13,795 | OpenAI | Proprietary |
| 27 | gpt-5.1 | 1,437 | +4 | 34,522 | OpenAI | Proprietary |
| 28 | gpt-5-high | 1,434 | +5 | 32,559 | OpenAI | Proprietary |
| 29 | qwen3-max-preview | 1,434 | +5 | 27,763 | Alibaba | Proprietary |
| 30 | o3-2025-04-16 | 1,432 | +4 | 61,272 | OpenAI | Proprietary |
| 31 | grok-4.1-fast-reasoning | 1,431 | +4 | 29,040 | xAI | Proprietary |
| 32 | kimi-k2-thinking-turbo | 1,429 | +4 | 34,127 | Moonshot | Modified MIT |
| 33 | gpt-5-chat | 1,426 | +4 | 31,753 | OpenAI | Proprietary |
| 34 | glm-4.6 | 1,425 | +4 | 35,242 | Zai | MIT |
| 35 | qwen3-max-2025-09-23 | 1,425 | +6 | 9,203 | Alibaba | Proprietary |
| 36 | claude-opus-4-20250514-thinking-16k | 1,424 | +4 | 37,930 | Anthropic | Proprietary |
| 37 | deepseek-v3.2-exp-thinking | 1,423 | +7 | 8,981 | DeepSeek | MIT |
| 38 | deepseek-v3.2-exp | 1,423 | +6 | 11,721 | DeepSeek | MIT |
| 39 | qwen3-235b-a22b-instruct-2507 | 1,423 | +3 | 69,847 | Alibaba | Apache 2.0 |
| 40 | grok-4-fast-chat | 1,422 | +8 | 6,983 | xAI | Proprietary |
| 41 | deepseek-v3.2-thinking | 1,420 | +5 | 23,731 | DeepSeek | MIT |
| 42 | deepseek-v3.2 | 1,420 | +5 | 28,747 | DeepSeek | MIT |
| 43 | deepseek-r1-0528 | 1,419 | +6 | 19,281 | DeepSeek | MIT |
| 44 | ernie-5.0-preview-1022 | 1,419 | +9 | 4,594 | Baidu | Proprietary |
| 45 | deepseek-v3.1 | 1,418 | +6 | 15,269 | DeepSeek | MIT |
| 46 | kimi-k2-0905-preview | 1,417 | +6 | 11,959 | Moonshot | Modified MIT |
| 47 | deepseek-v3.1-thinking | 1,417 | +7 | 11,963 | DeepSeek | MIT |
| 48 | kimi-k2-0711-preview | 1,417 | +5 | 28,632 | Moonshot | Modified MIT |
| 49 | deepseek-v3.1-terminus | 1,416 | +10 | 3,757 | DeepSeek | MIT |
| 50 | deepseek-v3.1-terminus-thinking | 1,416 | +10 | 3,547 | DeepSeek | MIT |
| 51 | qwen3-vl-235b-a22b-instruct | 1,415 | +6 | 11,653 | Alibaba | Apache 2.0 |
| 52 | mistral-large-3 | 1,414 | +5 | 24,945 | Mistral | Apache 2.0 |
| 53 | gpt-4.1-2025-04-14 | 1,413 | +4 | 52,121 | OpenAI | Proprietary |
| 54 | claude-opus-4-20250514 | 1,413 | +4 | 45,522 | Anthropic | Proprietary |
| 55 | mistral-medium-2508 | 1,411 | +3 | 63,710 | Mistral | Proprietary |
| 56 | grok-3-preview-02-24 | 1,411 | +4 | 33,966 | xAI | Proprietary |
| 57 | gemini-2.5-flash | 1,411 | +3 | 94,795 | Proprietary | |
| 58 | glm-4.5 | 1,410 | +5 | 24,751 | Zai | MIT |
| 59 | grok-4-0709 | 1,410 | +4 | 41,993 | xAI | Proprietary |
| 60 | claude-haiku-4-5-20251001 | 1,406 | +4 | 45,273 | Anthropic | Proprietary |
数据仅供参考,以官方来源为准。模型名称旁的链接可跳转到 DataLearner 模型详情页。