Text Generation Arena 文本生成模型排行榜
基于 Text Generation Arena 用户匿名投票的最新AI文本生成模型排行榜,涵盖各模型的 Elo 得分、95% 置信区间、投票量、机构与许可证。
榜首模型
Claude Opus 4.6 (thinking)
最高得分
1,502
模型数量
360
数据版本
2026年05月17日
数据来源: LM Arena
关于本排行榜
本排行榜展示了当前最强 AI 大模型在文本生成任务中的综合实力排名。数据来源于 LMArena(前身为 LMSYS Chatbot Arena),这是目前全球最大的 AI 模型众包评测平台。用户在平台上与两个匿名模型同时对话,并投票选出更好的回答——排名完全由真实用户的偏好决定,而非实验室基准测试。
评测方法概要
匿名盲测:用户同时与两个"隐藏身份"的模型对话,根据回答质量投票,排除品牌偏见。
Elo 评分:基于国际象棋领域的 Elo Rating 体系(Bradley-Terry 模型),通过对战结果计算每个模型的实力分数。分数越高,说明模型在真实对话中被用户选中的概率越大。
场景覆盖广泛:涵盖编程、创意写作、数学推理、知识问答、角色扮演等高频真实场景。
DataLearner 在原始数据基础上提供中文解读与深度分析,并将排行榜模型关联至 DataLearner 模型库,方便您一键查看模型详情、API 定价、评测得分等完整信息。
排名总表
| 排名 | 模型名称 | 得分 | 95% CI | 投票数 | 机构 | 许可证 |
|---|---|---|---|---|---|---|
Claude Opus 4.6 (thinking)Anthropic | 1,502 | +/-4 | 27,454 | Anthropic | Proprietary | |
Opus 4.7 (thinking)Anthropic | 1,500 | +/-6 | 12,920 | Anthropic | Proprietary | |
Claude Opus 4.6Anthropic | 1,498 | +/-4 | 29,240 | Anthropic | Proprietary | |
| 4 | Opus 4.7Anthropic | 1,492 | +/-6 | 13,571 | Anthropic | Proprietary |
| 5 | Muse SparkFacebook AI研究实验室 | 1,489 | +/-6 | 11,103 | Facebook AI研究实验室 | Proprietary |
| 6 | Gemini 3.1 Pro PreviewGoogle Deep Mind | 1,488 | +/-4 | 34,189 | Google Deep Mind | Proprietary |
| 7 | Gemini 3.0 Pro (Preview 11-2025)Google Deep Mind | 1,486 | +/-4 | 41,331 | Google Deep Mind | Proprietary |
| 8 | GPT-5.5 (high)OpenAI | 1,481 | +/-6 | 10,172 | OpenAI | Proprietary |
| 9 | Gemini 3.5 FlashGoogle Deep Mind | 1,480 | +/-8 | 5,907 | Google Deep Mind | Proprietary |
| 10 | GPT-5.4 (high)OpenAI | 1,480 | +/-5 | 21,023 | OpenAI | Proprietary |
| 11 | GPT-5.5OpenAI | 1,478 | +/-6 | 10,294 | OpenAI | Proprietary |
| 12 | 1,478 | +/-5 | 22,458 | xAI | Proprietary | |
| 13 | GPT-5.2OpenAI | 1,477 | +/-4 | 27,988 | OpenAI | Proprietary |
| 14 | Qwen3.7-Max-Preview阿里巴巴 | 1,475 | +/-10 | 3,741 | 阿里巴巴 | Proprietary |
| 15 | 1,475 | +/-5 | 21,572 | xAI | Proprietary | |
| 16 | 1,474 | +/-5 | 21,565 | xAI | Proprietary | |
| 17 | Gemini 3.0 FlashGoogle Deep Mind | 1,473 | +/-4 | 30,742 | Google Deep Mind | Proprietary |
| 18 | ERNIE-5.1-Preview百度 | 1,473 | +/-7 | 9,004 | 百度 | Proprietary |
| 19 | Claude Opus 4 (thinking-32k)Anthropic | 1,473 | +/-4 | 37,130 | Anthropic | Proprietary |
| 20 | GLM 5.1智谱AI | 1,472 | +/-6 | 12,295 | 智谱AI | MIT |
| 21 | GPT-5.5 InstantOpenAI | 1,472 | +/-6 | 10,790 | OpenAI | Proprietary |
| 22 | Claude Sonnet 4.6Anthropic | 1,468 | +/-5 | 20,839 | Anthropic | Proprietary |
| 23 | Claude Opus 4Anthropic | 1,468 | +/-3 | 58,884 | Anthropic | Proprietary |
| 24 | GPT-5.4OpenAI | 1,467 | +/-5 | 22,146 | OpenAI | Proprietary |
| 25 | 1,467 | +/-3 | 59,368 | xAI | Proprietary | |
| 26 | MiMo V2.5 ProXiaomi | 1,465 | +/-6 | 9,700 | Xiaomi | MIT |
| 27 | Qwen3.5 Max PreviewAlibaba | 1,464 | +/-5 | 17,346 | Alibaba | Proprietary |
| 28 | Gemini 3.0 Flash (minimal)Google Deep Mind | 1,463 | +/-4 | 45,395 | Google Deep Mind | Proprietary |
| 29 | Kimi K2.6Moonshot AI | 1,462 | +/-6 | 10,281 | Moonshot AI | Modified MIT |
| 30 | DeepSeek-V4-Pro (thinking)DeepSeek-AI | 1,461 | +/-6 | 9,970 | DeepSeek-AI | MIT |
| 31 | 1,460 | +/-3 | 63,263 | xAI | Proprietary | |
| 32 | DeepSeek-V4-ProDeepSeek-AI | 1,459 | +/-6 | 10,729 | DeepSeek-AI | MIT |
| 33 | Qwen3.6-Max-Preview阿里巴巴 | 1,457 | +/-9 | 4,227 | 阿里巴巴 | Proprietary |
| 34 | GLM-5智谱AI | 1,457 | +/-5 | 20,816 | 智谱AI | MIT |
| 35 | DOLA Seed 2.0 ProBytedance | 1,456 | +/-4 | 30,543 | Bytedance | Proprietary |
| 36 | GPT-5.1 Pro (high)OpenAI | 1,455 | +/-4 | 40,848 | OpenAI | Proprietary |
| 37 | Claude Sonnet 4.5 (thinking-32k)Anthropic | 1,454 | +/-3 | 71,013 | Anthropic | Proprietary |
| 38 | Claude Sonnet 4.5Anthropic | 1,454 | +/-3 | 69,196 | Anthropic | Proprietary |
| 39 | GPT-5.4 mini (high)OpenAI | 1,454 | +/-5 | 18,979 | OpenAI | Proprietary |
| 40 | Gemma 4 31BDeepMind | 1,451 | +/-8 | 5,840 | DeepMind | Apache 2.0 |
| 41 | 1,451 | +/-6 | 9,082 | xAI | Proprietary | |
| 42 | ERNIE 5.0百度 | 1,450 | +/-4 | 31,558 | 百度 | Proprietary |
| 43 | Kimi K2 ThinkingMoonshot AI | 1,449 | +/-4 | 30,661 | Moonshot AI | Modified MIT |
| 44 | ERNIE 5.0百度 | 1,449 | +/-7 | 9,754 | 百度 | Proprietary |
| 45 | GPT-5.3OpenAI | 1,449 | +/-5 | 26,710 | OpenAI | Proprietary |
| 46 | Opus 4.1 (thinking-16k)Anthropic | 1,449 | +/-3 | 49,822 | Anthropic | Proprietary |
| 47 | MiMo V2 ProXiaomi | 1,447 | +/-5 | 18,975 | Xiaomi | Proprietary |
| 48 | Opus 4.1Anthropic | 1,447 | +/-3 | 77,378 | Anthropic | Proprietary |
| 49 | Gemini 2.5 Pro Experimental 03-25Google Deep Mind | 1,446 | +/-3 | 118,726 | Google Deep Mind | Proprietary |
| 50 | Qwen3.5-397B-A17B阿里巴巴 | 1,445 | +/-4 | 25,861 | 阿里巴巴 | Apache 2.0 |
| 51 | GPT-4.5OpenAI | 1,444 | +/-6 | 14,547 | OpenAI | Proprietary |
| 52 | Qwen 3.6 Plus Preview阿里巴巴 | 1,444 | +/-6 | 12,075 | 阿里巴巴 | Proprietary |
| 53 | GPT-4o(2025-03-27)OpenAI | 1,443 | +/-3 | 82,481 | OpenAI | Proprietary |
| 54 | GLM-4.7智谱AI | 1,443 | +/-6 | 12,134 | 智谱AI | MIT |
| 55 | DeepSeek-V4-Flash (thinking)DeepSeek-AI | 1,440 | +/-6 | 10,115 | DeepSeek-AI | MIT |
| 56 | GPT-5.2 Pro (high)OpenAI | 1,439 | +/-4 | 42,076 | OpenAI | Proprietary |
| 57 | GPT-5.1 InstantOpenAI | 1,439 | +/-4 | 43,497 | OpenAI | Proprietary |
| 58 | Gemma 4 26B A4BDeepMind | 1,438 | +/-8 | 5,782 | DeepMind | Apache 2.0 |
| 59 | gemini-3.1-flash-lite-previewGoogle | 1,436 | +/-4 | 27,771 | Proprietary | |
| 60 | GPT-5.2OpenAI | 1,436 | +/-4 | 39,304 | OpenAI | Proprietary |
| 61 | LongCat Flash Chat (2602)Meituan | 1,435 | +/-5 | 17,017 | Meituan | Proprietary |
| 62 | Qwen3 Max (Preview)阿里巴巴 | 1,435 | +/-5 | 27,731 | 阿里巴巴 | Proprietary |
| 63 | GPT-5-Pro (high)OpenAI | 1,434 | +/-5 | 31,951 | OpenAI | Proprietary |
| 64 | DeepSeek-V4-FlashDeepSeek-AI | 1,433 | +/-6 | 10,124 | DeepSeek-AI | MIT |
| 65 | Kimi K2.5 InstantMoonshot | 1,432 | +/-7 | 8,201 | Moonshot | Modified MIT |
| 66 | 1,431 | +/-3 | 52,836 | xAI | Proprietary | |
| 67 | OpenAI o3OpenAI | 1,431 | +/-4 | 59,771 | OpenAI | Proprietary |
| 68 | Kimi K2 Thinking (thinking-turbo)Moonshot AI | 1,430 | +/-3 | 56,632 | Moonshot AI | Modified MIT |
| 69 | MiMo V2.5Xiaomi | 1,429 | +/-6 | 9,729 | Xiaomi | MIT |
| 70 | amazon-nova-experimental-chat-26-02-10Amazon | 1,427 | +/-10 | 3,433 | Amazon | Proprietary |
| 71 | GPT-5OpenAI | 1,427 | +/-4 | 31,598 | OpenAI | Proprietary |
| 72 | GLM-4.6智谱AI | 1,426 | +/-4 | 35,672 | 智谱AI | MIT |
| 73 | DeepSeek V3.2-Exp (thinking)DeepSeek-AI | 1,425 | +/-7 | 9,069 | DeepSeek-AI | MIT |
| 74 | qwen3-max-2025-09-23Alibaba | 1,424 | +/-6 | 9,166 | Alibaba | Proprietary |
| 75 | Claude Opus 4 (thinking-16k)Anthropic | 1,424 | +/-4 | 36,920 | Anthropic | Proprietary |
| 76 | DeepSeek V3.2DeepSeek-AI | 1,424 | +/-4 | 45,275 | DeepSeek-AI | MIT |
| 77 | Qwen3-235B-A22B-2507阿里巴巴 | 1,423 | +/-3 | 92,159 | 阿里巴巴 | Apache 2.0 |
| 78 | DeepSeek V3.2-ExpDeepSeek-AI | 1,423 | +/-6 | 11,936 | DeepSeek-AI | MIT |
| 79 | DeepSeek-R1-0528DeepSeek-AI | 1,422 | +/-6 | 18,467 | DeepSeek-AI | MIT |
| 80 | DeepSeek V3.2 (thinking)DeepSeek-AI | 1,422 | +/-4 | 39,392 | DeepSeek-AI | MIT |
| 81 | 1,421 | +/-8 | 6,817 | xAI | Proprietary | |
| 82 | ERNIE 5.0百度 | 1,419 | +/-9 | 4,711 | 百度 | Proprietary |
| 83 | Kimi K2 0905Moonshot AI | 1,418 | +/-6 | 11,795 | Moonshot AI | Modified MIT |
| 84 | DeepSeek-V3.1DeepSeek-AI | 1,418 | +/-6 | 14,974 | DeepSeek-AI | MIT |
| 85 | Qwen3.5-122B-A10B阿里巴巴 | 1,418 | +/-5 | 23,088 | 阿里巴巴 | Apache 2.0 |
| 86 | hunyuan-hy3-previewTencent | 1,417 | +/-8 | 5,184 | Tencent | tencent-hunyuan-community |
| 87 | DeepSeek-V3.1 Terminus (thinking)DeepSeek-AI | 1,417 | +/-10 | 3,471 | DeepSeek-AI | MIT |
| 88 | Kimi K2Moonshot AI | 1,417 | +/-5 | 27,644 | Moonshot AI | Modified MIT |
| 89 | DeepSeek-V3.1 (thinking)DeepSeek-AI | 1,417 | +/-7 | 11,753 | DeepSeek-AI | MIT |
| 90 | DeepSeek-V3.1 TerminusDeepSeek-AI | 1,416 | +/-10 | 3,707 | DeepSeek-AI | MIT |
| 91 | amazon-nova-experimental-chat-26-01-10Amazon | 1,416 | +/-10 | 3,415 | Amazon | Proprietary |
| 92 | Qwen3-VL-235B-A22B-Instruct阿里巴巴 | 1,415 | +/-6 | 11,518 | 阿里巴巴 | Apache 2.0 |
| 93 | Mistral Large 3MistralAI | 1,415 | +/-4 | 41,713 | MistralAI | Apache 2.0 |
| 94 | GPT-4.1OpenAI | 1,413 | +/-4 | 50,995 | OpenAI | Proprietary |
| 95 | Claude Opus 4Anthropic | 1,412 | +/-4 | 44,235 | Anthropic | Proprietary |
| 96 | 1,412 | +/-4 | 32,914 | xAI | Proprietary | |
| 97 | GLM-4.5智谱AI | 1,411 | +/-5 | 24,324 | 智谱AI | MIT |
| 98 | Gemini 2.5 FlashGoogle Deep Mind | 1,411 | +/-3 | 118,454 | Google Deep Mind | Proprietary |
| 99 | Haiku 4.5Anthropic | 1,410 | +/-3 | 70,948 | Anthropic | Proprietary |
| 100 | 1,410 | +/-4 | 41,407 | xAI | Proprietary | |
| 101 | Magistral-Medium-2506MistralAI | 1,410 | +/-3 | 88,255 | MistralAI | Proprietary |
| 102 | 1,409 | +/-5 | 16,997 | MiniMaxAI | Modified MIT | |
| 103 | Qwen3.5-27B阿里巴巴 | 1,409 | +/-5 | 22,477 | 阿里巴巴 | Apache 2.0 |
| 104 | GPT-5.4 nano (high)OpenAI | 1,406 | +/-5 | 18,377 | OpenAI | Proprietary |
| 105 | Gemini 2.5 Flash-Preview-09-2025Google Deep Mind | 1,405 | +/-4 | 32,923 | Google Deep Mind | Proprietary |
| 106 | 1,404 | +/-5 | 18,720 | xAI | Proprietary | |
| 107 | qwen3-235b-a22b-no-thinkingAlibaba | 1,403 | +/-5 | 38,234 | Alibaba | Apache 2.0 |
| 108 | Qwen3-Next阿里巴巴 | 1,402 | +/-5 | 22,869 | 阿里巴巴 | Apache 2.0 |
| 109 | OpenAI o1OpenAI | 1,402 | +/-4 | 27,807 | OpenAI | Proprietary |
| 110 | LongCat Flash Chat (2602)Meituan | 1,401 | +/-6 | 11,402 | Meituan | MIT |
| 111 | qwen3-235b-a22b-thinking-2507Alibaba | 1,399 | +/-7 | 8,999 | Alibaba | Apache 2.0 |
| 112 | Claude Sonnet 4 (thinking-32k)Anthropic | 1,399 | +/-4 | 35,123 | Anthropic | Proprietary |
| 113 | DeepSeek-R1DeepSeek-AI | 1,398 | +/-5 | 18,524 | DeepSeek-AI | MIT |
| 114 | Qwen3.5-35B-A3B阿里巴巴 | 1,397 | +/-5 | 23,500 | 阿里巴巴 | Apache 2.0 |
| 115 | hunyuan-vision-1.5-thinkingTencent | 1,396 | +/-12 | 2,219 | Tencent | Proprietary |
| 116 | Step 3.5 FlashStepFunAI | 1,396 | +/-5 | 23,305 | StepFunAI | Proprietary |
| 117 | Qwen3-VL-235B-A22B-Instruct (thinking)阿里巴巴 | 1,396 | +/-7 | 7,941 | 阿里巴巴 | Apache 2.0 |
| 118 | DeepSeek-V3-0324DeepSeek-AI | 1,395 | +/-4 | 45,520 | DeepSeek-AI | MIT |
| 119 | Step 3.5 FlashStepFunAI | 1,395 | +/-4 | 28,558 | StepFunAI | Apache 2.0 |
| 120 | amazon-nova-experimental-chat-12-10Amazon | 1,395 | +/-10 | 3,683 | Amazon | Proprietary |
| 121 | 1,394 | +/-4 | 28,912 | MiniMaxAI | Modified MIT | |
| 122 | mimo-v2-flash (non-thinking)Xiaomi | 1,393 | +/-4 | 40,885 | Xiaomi | MIT |
| 123 | MAI Image 1Microsoft AI | 1,393 | +/-5 | 17,890 | Microsoft AI | Proprietary |
| 124 | GPT-5-mini (high)OpenAI | 1,390 | +/-5 | 27,045 | OpenAI | Proprietary |
| 125 | OpenAI o4 - miniOpenAI | 1,390 | +/-4 | 45,450 | OpenAI | Proprietary |
| 126 | Claude Sonnet 4Anthropic | 1,389 | +/-4 | 40,333 | Anthropic | Proprietary |
| 127 | OpenAI o1OpenAI | 1,388 | +/-5 | 31,122 | OpenAI | Proprietary |
| 128 | Hunyuan-T1腾讯AI实验室 | 1,387 | +/-9 | 4,714 | 腾讯AI实验室 | Proprietary |
| 129 | mimo-v2-flash (thinking)Xiaomi | 1,387 | +/-6 | 10,973 | Xiaomi | MIT |
| 130 | Qwen3-Coder-480B-A35B阿里巴巴 | 1,387 | +/-5 | 25,753 | 阿里巴巴 | Apache 2.0 |
| 131 | Claude Sonnet 3.7 (thinking-32k)Anthropic | 1,387 | +/-4 | 38,839 | Anthropic | Proprietary |
| 132 | mistral-medium-2505Mistral | 1,387 | +/-5 | 33,243 | Mistral | Proprietary |
| 133 | 1,385 | +/-5 | 17,156 | MiniMaxAI | MIT | |
| 134 | Qwen3-30B-A3B-2507阿里巴巴 | 1,383 | +/-5 | 23,750 | 阿里巴巴 | Apache 2.0 |
| 135 | GPT-4.1 miniOpenAI | 1,382 | +/-4 | 39,354 | OpenAI | Proprietary |
| 136 | hunyuan-turbos-20250416Tencent | 1,382 | +/-6 | 10,723 | Tencent | Proprietary |
| 137 | Gemini 2.5 Flash-Lite-Preview-09-2025 (no-thinking)Google Deep Mind | 1,380 | +/-3 | 47,249 | Google Deep Mind | Proprietary |
| 138 | GLM-4.6V智谱AI | 1,378 | +/-11 | 2,806 | 智谱AI | MIT |
| 139 | trinity-large-previewArcee AI | 1,376 | +/-5 | 24,901 | Arcee AI | Apache 2.0 |
| 140 | Qwen3-235B-A22B阿里巴巴 | 1,375 | +/-5 | 26,278 | 阿里巴巴 | Apache 2.0 |
| 141 | Gemini 2.5 Flash-Lite (thinking)Google Deep Mind | 1,375 | +/-5 | 32,934 | Google Deep Mind | Proprietary |
| 142 | Qwen2.5-Max阿里巴巴 | 1,374 | +/-4 | 32,624 | 阿里巴巴 | Proprietary |
| 143 | GLM-4.5-Air智谱AI | 1,373 | +/-4 | 31,099 | 智谱AI | MIT |
| 144 | trinity-large-thinkingArcee AI | 1,373 | +/-5 | 16,436 | Arcee AI | Apache 2.0 |
| 145 | Claude 3.5 SonnetAnthropic | 1,372 | +/-3 | 88,356 | Anthropic | Proprietary |
| 146 | Claude Sonnet 3.7Anthropic | 1,371 | +/-4 | 43,197 | Anthropic | Proprietary |
| 147 | Qwen3-Next (thinking)阿里巴巴 | 1,369 | +/-6 | 13,706 | 阿里巴巴 | Apache 2.0 |
| 148 | GLM-4.7-Flash智谱AI | 1,368 | +/-6 | 11,750 | 智谱AI | MIT |
| 149 | amazon-nova-experimental-chat-11-10Amazon | 1,367 | +/-4 | 25,416 | Amazon | Proprietary |
| 150 | Gemma 3 - 27B (IT)Google Deep Mind | 1,366 | +/-4 | 47,559 | Google Deep Mind | Gemma |
| 151 | minimax-m1MiniMax | 1,364 | +/-4 | 35,221 | MiniMax | Apache 2.0 |
| 152 | OpenAI o3-mini (high)OpenAI | 1,363 | +/-5 | 18,589 | OpenAI | Proprietary |
| 153 | OpenAI o3-mini (high)OpenAI | 1,362 | +/-5 | 16,973 | OpenAI | Proprietary |
| 154 | nvidia-nemotron-3-super-120b-a12bNvidia | 1,361 | +/-7 | 7,418 | Nvidia | NVIDIA Open Model |
| 155 | Gemini 2.0 Flash ExperimentalDeepMind | 1,360 | +/-4 | 43,765 | DeepMind | Proprietary |
| 156 | DeepSeek-V3DeepSeek-AI | 1,358 | +/-5 | 21,770 | DeepSeek-AI | DeepSeek |
| 157 | Mistral-Small-3.2MistralAI | 1,357 | +/-5 | 17,716 | MistralAI | Apache 2.0 |
| 158 | 1,357 | +/-5 | 22,729 | xAI | Proprietary | |
| 159 | intellect-3Prime Intellect | 1,357 | +/-8 | 5,328 | Prime Intellect | MIT |
| 160 | C4AI Command A (202503)CohereAI | 1,354 | +/-3 | 56,294 | CohereAI | CC-BY-NC-4.0 |
| 161 | GLM-4.5V智谱AI | 1,353 | +/-8 | 4,962 | 智谱AI | MIT |
| 162 | Gemini 2.0 Flash-LiteDeepMind | 1,353 | +/-4 | 24,955 | DeepMind | Proprietary |
| 163 | GPT OSS 120BOpenAI | 1,353 | +/-4 | 30,646 | OpenAI | Apache 2.0 |
| 164 | Gemini 1.5 ProGoogle Deep Mind | 1,351 | +/-3 | 55,606 | Google Deep Mind | Proprietary |
| 165 | amazon-nova-experimental-chat-10-20Amazon | 1,350 | +/-6 | 11,470 | Amazon | Proprietary |
| 166 | hunyuan-turbos-20250226Tencent | 1,349 | +/-12 | 2,220 | Tencent | Proprietary |
| 167 | Step3StepFunAI | 1,348 | +/-7 | 6,551 | StepFunAI | Apache 2.0 |
| 168 | amazon-nova-experimental-chat-10-09Amazon | 1,348 | +/-11 | 2,841 | Amazon | Proprietary |
| 169 | OpenAI o3-miniOpenAI | 1,347 | +/-4 | 57,349 | OpenAI | Proprietary |
| 170 | llama-3.1-nemotron-ultra-253b-v1Nvidia | 1,347 | +/-12 | 2,549 | Nvidia | Nvidia Open Model |
| 171 | Qwen3-32B阿里巴巴 | 1,347 | +/-9 | 3,926 | 阿里巴巴 | Apache 2.0 |
| 172 | mercury-2Inception AI | 1,347 | +/-11 | 3,120 | Inception AI | Proprietary |
| 173 | ling-flash-2.0InclusionAI | 1,346 | +/-7 | 7,010 | InclusionAI | MIT |
| 174 | 1,346 | +/-8 | 6,868 | MiniMaxAI | Apache 2.0 | |
| 175 | qwen-plus-0125Alibaba | 1,346 | +/-8 | 5,819 | Alibaba | Proprietary |
| 176 | GPT-4oOpenAI | 1,345 | +/-3 | 112,881 | OpenAI | Proprietary |
| 177 | nvidia-llama-3.3-nemotron-super-49b-v1.5Nvidia | 1,343 | +/-10 | 3,344 | Nvidia | Nvidia Open |
| 178 | glm-4-plus-0111Zhipu | 1,343 | +/-8 | 5,760 | Zhipu | Proprietary |
| 179 | Claude 3.5 SonnetAnthropic | 1,342 | +/-3 | 82,419 | Anthropic | Proprietary |
| 180 | Gemma 3 - 12B (IT)Google Deep Mind | 1,342 | +/-10 | 3,829 | Google Deep Mind | Gemma |
| 181 | hunyuan-turbo-0110Tencent | 1,340 | +/-12 | 2,290 | Tencent | Proprietary |
| 182 | GPT-5-Nano (high)OpenAI | 1,337 | +/-7 | 8,273 | OpenAI | Proprietary |
| 183 | Nova 2 Lite亚马逊 | 1,337 | +/-6 | 12,242 | 亚马逊 | Proprietary |
| 184 | OpenAI o1-miniOpenAI | 1,337 | +/-4 | 51,981 | OpenAI | Proprietary |
| 185 | QwQ-32B阿里巴巴 | 1,336 | +/-4 | 25,403 | 阿里巴巴 | Apache 2.0 |
| 186 | 1,335 | +/-4 | 63,498 | xAI | Proprietary | |
| 187 | gemini-advanced-0514Google | 1,335 | +/-5 | 50,148 | Proprietary | |
| 188 | GPT-4oOpenAI | 1,335 | +/-4 | 45,499 | OpenAI | Proprietary |
| 189 | llama-3.1-405b-instruct-bf16Meta | 1,334 | +/-4 | 41,375 | Meta | Llama 3.1 Community |
| 190 | step-2-16k-exp-202412StepFun | 1,334 | +/-9 | 4,833 | StepFun | Proprietary |
| 191 | llama-3.1-405b-instruct-fp8Meta | 1,333 | +/-4 | 59,656 | Meta | Llama 3.1 Community |
| 192 | olmo-3.1-32b-instructAi2 | 1,330 | +/-6 | 12,228 | Ai2 | Apache 2.0 |
| 193 | yi-lightning01 AI | 1,328 | +/-5 | 27,332 | 01 AI | Proprietary |
| 194 | molmo-2-8bAi2 | 1,328 | +/-21 | 805 | Ai2 | Apache 2.0 |
| 195 | llama-3.3-nemotron-49b-super-v1Nvidia | 1,328 | +/-12 | 2,218 | Nvidia | Nvidia |
| 196 | Qwen3-30B-A3B阿里巴巴 | 1,327 | +/-5 | 26,500 | 阿里巴巴 | Apache 2.0 |
| 197 | Llama 4 Maverick InstructFacebook AI研究实验室 | 1,327 | +/-4 | 39,993 | Facebook AI研究实验室 | Llama 4 |
| 198 | hunyuan-large-2025-02-10Tencent | 1,326 | +/-10 | 3,738 | Tencent | Proprietary |
| 199 | Runway Gen-4 TurboOpenAI | 1,324 | +/-4 | 98,114 | OpenAI | Proprietary |
| 200 | deepseek-v2.5-1210DeepSeek | 1,323 | +/-8 | 6,795 | DeepSeek | DeepSeek |
| 201 | Gemini 1.5 ProGoogle Deep Mind | 1,323 | +/-4 | 79,138 | Google Deep Mind | Proprietary |
| 202 | Claude 3.5 HaikuAnthropic | 1,323 | +/-3 | 70,008 | Anthropic | Proprietary |
| 203 | Llama 4 Scout InstructFacebook AI研究实验室 | 1,322 | +/-5 | 30,312 | Facebook AI研究实验室 | Llama |
| 204 | GPT-4.1 nanoOpenAI | 1,322 | +/-8 | 6,103 | OpenAI | Proprietary |
| 205 | Claude3-OpusAnthropic | 1,321 | +/-3 | 194,909 | Anthropic | Proprietary |
| 206 | ring-flash-2.0InclusionAI | 1,321 | +/-7 | 7,153 | InclusionAI | MIT |
| 207 | step-1o-turbo-202506StepFun | 1,320 | +/-7 | 9,041 | StepFun | Proprietary |
| 208 | glm-4-plusZhipu AI | 1,319 | +/-5 | 26,126 | Zhipu AI | Proprietary |
| 209 | Gemma-3n-E4BGoogle Deep Mind | 1,318 | +/-5 | 22,606 | Google Deep Mind | Gemma |
| 210 | Llama3.3-70B-InstructFacebook AI研究实验室 | 1,318 | +/-3 | 54,746 | Facebook AI研究实验室 | Llama-3.3 |
| 211 | qwen-max-0919Alibaba | 1,318 | +/-6 | 16,478 | Alibaba | Qwen |
| 212 | GPT-4o miniOpenAI | 1,317 | +/-4 | 68,710 | OpenAI | Proprietary |
| 213 | GPT OSS 20BOpenAI | 1,317 | +/-6 | 10,633 | OpenAI | Apache 2.0 |
| 214 | nvidia-nemotron-3-nano-30b-a3b-bf16Nvidia | 1,317 | +/-6 | 15,517 | Nvidia | NVIDIA Open Model |
| 215 | qwen2.5-plus-1127Alibaba | 1,315 | +/-6 | 10,187 | Alibaba | Proprietary |
| 216 | athene-v2-chatNexusFlow | 1,314 | +/-5 | 24,739 | NexusFlow | NexusFlow |
| 217 | mistral-large-2407Mistral | 1,314 | +/-4 | 45,459 | Mistral | Mistral Research |
| 218 | GPT-4OpenAI | 1,312 | +/-4 | 93,439 | OpenAI | Proprietary |
| 219 | GPT-4OpenAI | 1,312 | +/-4 | 100,105 | OpenAI | Proprietary |
| 220 | granite-4.1-8bIBM | 1,311 | +/-11 | 3,240 | IBM | Apache 2.0 |
| 221 | hunyuan-standard-2025-02-10Tencent | 1,311 | +/-10 | 3,904 | Tencent | Proprietary |
| 222 | gemini-1.5-flash-002Google | 1,309 | +/-4 | 34,902 | Proprietary | |
| 223 | 1,308 | +/-4 | 52,567 | xAI | Proprietary | |
| 224 | DeepSeek V2.5DeepSeek-AI | 1,307 | +/-5 | 24,572 | DeepSeek-AI | DeepSeek |
| 225 | mercuryInception AI | 1,306 | +/-14 | 1,954 | Inception AI | Proprietary |
| 226 | athene-70b-0725NexusFlow | 1,306 | +/-6 | 19,621 | NexusFlow | CC-BY-NC-4.0 |
| 227 | olmo-3-32b-thinkAi2 | 1,305 | +/-8 | 5,953 | Ai2 | Apache 2.0 |
| 228 | mistral-large-2411Mistral | 1,305 | +/-4 | 28,073 | Mistral | MRL |
| 229 | Magistral-Medium-2506MistralAI | 1,304 | +/-6 | 11,643 | MistralAI | Proprietary |
| 230 | Gemma 3 - 4B (IT)Google Deep Mind | 1,303 | +/-9 | 4,171 | Google Deep Mind | Gemma |
| 231 | Mistral-Small-3.1-24B-Instruct-2503MistralAI | 1,303 | +/-5 | 33,235 | MistralAI | Apache 2.0 |
| 232 | Qwen2.5-VL-72B-Instruct阿里巴巴 | 1,303 | +/-4 | 39,406 | 阿里巴巴 | Qwen |
| 233 | Llama3.1-70B-InstructFacebook AI研究实验室 | 1,299 | +/-8 | 7,140 | Facebook AI研究实验室 | Llama 3.1 |
| 234 | hunyuan-large-visionTencent | 1,294 | +/-9 | 5,371 | Tencent | Proprietary |
| 235 | Llama3.1-70B-InstructFacebook AI研究实验室 | 1,293 | +/-4 | 55,240 | Facebook AI研究实验室 | Llama 3.1 Community |
| 236 | amazon-nova-pro-v1.0Amazon | 1,290 | +/-5 | 24,745 | Amazon | Proprietary |
| 237 | jamba-1.5-largeAI21 Labs | 1,289 | +/-7 | 8,662 | AI21 Labs | Jamba Open |
| 238 | gemma-2-27b-itGoogle | 1,288 | +/-3 | 75,754 | Gemma license | |
| 239 | reka-core-20240904Reka AI | 1,287 | +/-7 | 7,312 | Reka AI | Proprietary |
| 240 | ibm-granite-h-smallIBM | 1,287 | +/-8 | 5,677 | IBM | Apache 2.0 |
| 241 | GPT-4OpenAI | 1,286 | +/-5 | 54,173 | OpenAI | Proprietary |
| 242 | llama-3.1-tulu-3-70bAi2 | 1,286 | +/-10 | 2,846 | Ai2 | Llama 3.1 |
| 243 | gemini-1.5-flash-001Google | 1,286 | +/-4 | 62,833 | Proprietary | |
| 244 | llama-3.1-nemotron-51b-instructNvidia | 1,285 | +/-10 | 3,749 | Nvidia | Llama 3.1 |
| 245 | olmo-3.1-32b-thinkAi2 | 1,285 | +/-7 | 8,505 | Ai2 | Apache 2.0 |
| 246 | Claude3-SonnetAnthropic | 1,280 | +/-4 | 109,284 | Anthropic | Proprietary |
| 247 | gemma-2-9b-it-simpoPrinceton | 1,279 | +/-7 | 10,072 | Princeton | MIT |
| 248 | nemotron-4-340b-instructNvidia | 1,276 | +/-5 | 19,659 | Nvidia | NVIDIA Open Model |
| 249 | command-r-plus-08-2024Cohere | 1,276 | +/-7 | 9,866 | Cohere | CC-BY-NC-4.0 |
| 250 | Llama3-70B-InstructFacebook AI研究实验室 | 1,275 | +/-4 | 156,876 | Facebook AI研究实验室 | Llama 3 Community |
| 251 | GPT-4OpenAI | 1,274 | +/-4 | 88,723 | OpenAI | Proprietary |
| 252 | Mistral Small 24B Instruct 2501MistralAI | 1,274 | +/-6 | 14,681 | MistralAI | Apache 2.0 |
| 253 | GLM4智谱AI | 1,273 | +/-7 | 9,788 | 智谱AI | Proprietary |
| 254 | reka-flash-20240904Reka AI | 1,271 | +/-7 | 7,536 | Reka AI | Proprietary |
| 255 | Qwen2.5-Coder-32B-Instruct阿里巴巴 | 1,270 | +/-8 | 5,432 | 阿里巴巴 | Apache 2.0 |
| 256 | C4AI Aya Vision 32BCohereAI | 1,267 | +/-5 | 27,124 | CohereAI | CC-BY-NC-4.0 |
| 257 | gemma-2-9b-itGoogle | 1,266 | +/-4 | 54,611 | Gemma license | |
| 258 | deepseek-coder-v2DeepSeek | 1,264 | +/-6 | 15,147 | DeepSeek | DeepSeek License |
| 259 | Qwen2-72B-Instruct阿里巴巴 | 1,261 | +/-5 | 37,325 | 阿里巴巴 | Qianwen LICENSE |
| 260 | C4AI Command R+CohereAI | 1,261 | +/-4 | 77,554 | CohereAI | CC-BY-NC-4.0 |
| 261 | Claude3-HaikuAnthropic | 1,260 | +/-4 | 117,701 | Anthropic | Proprietary |
| 262 | amazon-nova-lite-v1.0Amazon | 1,260 | +/-5 | 19,372 | Amazon | Proprietary |
| 263 | gemini-1.5-flash-8b-001Google | 1,258 | +/-4 | 35,558 | Proprietary | |
| 264 | Phi 4 - 14BMicrosoft Azure | 1,256 | +/-5 | 24,126 | Microsoft Azure | MIT |
| 265 | olmo-2-0325-32b-instructAi2 | 1,251 | +/-11 | 3,334 | Ai2 | Apache-2.0 |
| 266 | command-r-08-2024Cohere | 1,249 | +/-7 | 10,140 | Cohere | CC-BY-NC-4.0 |
| 267 | mistral-large-2402Mistral | 1,241 | +/-5 | 62,436 | Mistral | Proprietary |
| 268 | amazon-nova-micro-v1.0Amazon | 1,240 | +/-5 | 19,364 | Amazon | Proprietary |
| 269 | jamba-1.5-miniAI21 Labs | 1,239 | +/-7 | 8,858 | AI21 Labs | Jamba Open |
| 270 | ministral-8b-2410Mistral | 1,237 | +/-9 | 4,781 | Mistral | MRL |
| 271 | gemini-pro-dev-apiGoogle | 1,235 | +/-7 | 18,354 | Proprietary | |
| 272 | Qwen1.5-110B-Chat阿里巴巴 | 1,233 | +/-6 | 26,195 | 阿里巴巴 | Qianwen LICENSE |
| 273 | hunyuan-standard-256kTencent | 1,233 | +/-12 | 2,728 | Tencent | Proprietary |
| 274 | reka-flash-21b-20240226-onlineReka AI | 1,232 | +/-7 | 15,450 | Reka AI | Proprietary |
| 275 | Qwen1.5-72B-Chat阿里巴巴 | 1,232 | +/-5 | 39,302 | 阿里巴巴 | Qianwen LICENSE |
| 276 | Mixtral-8x22B-Instruct-v0.1MistralAI | 1,228 | +/-5 | 51,416 | MistralAI | Apache 2.0 |
| 277 | command-rCohere | 1,226 | +/-5 | 54,036 | Cohere | CC-BY-NC-4.0 |
| 278 | reka-flash-21b-20240226Reka AI | 1,226 | +/-6 | 24,806 | Reka AI | Proprietary |
| 279 | gpt-3.5-turbo-0125OpenAI | 1,223 | +/-5 | 66,207 | OpenAI | Proprietary |
| 280 | C4AI Aya Vision 8BCohereAI | 1,223 | +/-7 | 9,818 | CohereAI | CC-BY-NC-4.0 |
| 281 | Llama3-8B-InstructFacebook AI研究实验室 | 1,223 | +/-4 | 104,642 | Facebook AI研究实验室 | Llama 3 Community |
| 282 | mistral-mediumMistral | 1,222 | +/-5 | 34,550 | Mistral | Proprietary |
| 283 | Gemini-proDeepMind | 1,221 | +/-12 | 6,390 | DeepMind | Proprietary |
| 284 | llama-3.1-tulu-3-8bAi2 | 1,220 | +/-11 | 2,896 | Ai2 | Llama 3.1 |
| 285 | Yi-1.5-34B零一万物 | 1,212 | +/-5 | 24,146 | 零一万物 | Apache-2.0 |
| 286 | zephyr-orpo-141b-A35b-v0.1HuggingFace | 1,212 | +/-11 | 4,652 | HuggingFace | Apache 2.0 |
| 287 | Llama3.1-8B-InstructFacebook AI研究实验室 | 1,211 | +/-4 | 49,605 | Facebook AI研究实验室 | Llama 3.1 Community |
| 288 | Llama3.1-8B-InstructFacebook AI研究实验室 | 1,207 | +/-11 | 3,090 | Facebook AI研究实验室 | Apache 2.0 |
| 289 | qwen1.5-32b-chatAlibaba | 1,203 | +/-6 | 21,741 | Alibaba | Qianwen LICENSE |
| 290 | gpt-3.5-turbo-1106OpenAI | 1,202 | +/-9 | 16,619 | OpenAI | Proprietary |
| 291 | gemma-2-2b-itGoogle | 1,199 | +/-4 | 46,616 | Gemma license | |
| 292 | Phi-3-medium 14B-previewMicrosoft Azure | 1,197 | +/-5 | 25,055 | Microsoft Azure | MIT |
| 293 | mixtral-8x7b-instruct-v0.1Mistral | 1,196 | +/-4 | 73,503 | Mistral | Apache 2.0 |
| 294 | DBRX Instructdatabricks | 1,194 | +/-6 | 32,191 | databricks | DBRX LICENSE |
| 295 | InternLM2-Base-20B上海人工智能实验室 | 1,190 | +/-7 | 9,901 | 上海人工智能实验室 | Other |
| 296 | Qwen1.5-14B-Chat阿里巴巴 | 1,190 | +/-7 | 17,839 | 阿里巴巴 | Qianwen LICENSE |
| 297 | WizardLM-70B-V1.0WizardLM Team | 1,184 | +/-9 | 8,214 | WizardLM Team | Llama 2 Community |
| 298 | DeepSeek LLM 67B ChatDeepSeek-AI | 1,183 | +/-12 | 4,932 | DeepSeek-AI | DeepSeek License |
| 299 | Yi-34B零一万物 | 1,183 | +/-7 | 15,483 | 零一万物 | Yi License |
| 300 | granite-3.0-8b-instructIBM | 1,181 | +/-9 | 6,638 | IBM | Apache 2.0 |
| 301 | openchat-3.5OpenChat | 1,181 | +/-10 | 7,968 | OpenChat | Apache-2.0 |
| 302 | openchat-3.5-0106OpenChat | 1,181 | +/-8 | 12,637 | OpenChat | Apache-2.0 |
| 303 | Gemma 1.1-7B-ITGoogle Research | 1,180 | +/-6 | 23,893 | Google Research | Gemma license |
| 304 | snowflake-arctic-instructSnowflake | 1,178 | +/-6 | 32,832 | Snowflake | Apache 2.0 |
| 305 | granite-3.1-2b-instructIBM | 1,178 | +/-11 | 3,188 | IBM | Apache 2.0 |
| 306 | tulu-2-dpo-70bAllenAI/UW | 1,177 | +/-10 | 6,535 | AllenAI/UW | AI2 ImpACT Low-risk |
| 307 | openhermes-2.5-mistral-7bNousResearch | 1,174 | +/-10 | 5,006 | NousResearch | Apache-2.0 |
| 308 | Vicuna 33BLM-SYS | 1,172 | +/-6 | 22,479 | LM-SYS | Non-commercial |
| 309 | starling-lm-7b-betaNexusflow | 1,171 | +/-7 | 16,056 | Nexusflow | Apache-2.0 |
| 310 | Phi-3-small 7BMicrosoft Azure | 1,170 | +/-6 | 17,766 | Microsoft Azure | MIT |
| 311 | llama-2-70b-chatMeta | 1,170 | +/-6 | 38,492 | Meta | Llama 2 Community |
| 312 | starling-lm-7b-alphaUC Berkeley | 1,166 | +/-8 | 10,224 | UC Berkeley | CC-BY-NC-4.0 |
| 313 | llama-3.2-3b-instructMeta | 1,166 | +/-8 | 7,936 | Meta | Llama 3.2 |
| 314 | nous-hermes-2-mixtral-8x7b-dpoNousResearch | 1,164 | +/-12 | 3,777 | NousResearch | Apache-2.0 |
| 315 | QwQ-32B-Preview阿里巴巴 | 1,155 | +/-11 | 3,231 | 阿里巴巴 | Apache 2.0 |
| 316 | Qwen3-VL-2B阿里巴巴 | 1,155 | +/-8 | 6,837 | 阿里巴巴 | Apache 2.0 |
| 317 | llama2-70b-steerlm-chatNvidia | 1,154 | +/-13 | 3,585 | Nvidia | Llama 2 Community |
| 318 | solar-10.7b-instruct-v1.0Upstage AI | 1,151 | +/-13 | 4,155 | Upstage AI | CC-BY-NC-4.0 |
| 319 | dolphin-2.2.1-mistral-7bCognitive Computations | 1,151 | +/-15 | 1,679 | Cognitive Computations | Apache-2.0 |
| 320 | MPT-30B-ChatMosaicML | 1,149 | +/-12 | 2,572 | MosaicML | CC-BY-NC-SA-4.0 |
| 321 | Mistral-7B-Instruct-v0.2MistralAI | 1,148 | +/-7 | 19,402 | MistralAI | Apache-2.0 |
| 322 | wizardlm-13bMicrosoft | 1,148 | +/-9 | 7,044 | Microsoft | Llama 2 Community |
| 323 | falcon-180b-chatTII | 1,146 | +/-17 | 1,295 | TII | Falcon-180B TII License |
| 324 | Qwen1.5-7B-Chat阿里巴巴 | 1,143 | +/-10 | 4,737 | 阿里巴巴 | Qianwen LICENSE |
| 325 | Phi-3-mini 3.8BMicrosoft Azure | 1,142 | +/-6 | 12,297 | Microsoft Azure | MIT |
| 326 | Baichuan2-13B-Chat百川智能 | 1,140 | +/-7 | 19,174 | 百川智能 | Llama 2 Community |
| 327 | Vicuna 13BLM-SYS | 1,140 | +/-7 | 19,367 | LM-SYS | Llama 2 Community |
| 328 | Qwen-14B-Chat阿里巴巴 | 1,137 | +/-11 | 4,964 | 阿里巴巴 | Qianwen LICENSE |
| 329 | PaLM 2Google Research | 1,137 | +/-9 | 8,554 | Google Research | Proprietary |
| 330 | Gemma 7B - ItGoogle Research | 1,136 | +/-9 | 8,925 | Google Research | Gemma license |
| 331 | CodeLLaMA-34BFacebook AI研究实验室 | 1,135 | +/-9 | 7,366 | Facebook AI研究实验室 | Llama 2 Community |
| 332 | zephyr-7b-betaHuggingFace | 1,130 | +/-9 | 11,118 | HuggingFace | MIT |
| 333 | Phi-3-mini 3.8BMicrosoft Azure | 1,128 | +/-7 | 20,685 | Microsoft Azure | MIT |
| 334 | Phi-3-mini 3.8BMicrosoft Azure | 1,127 | +/-6 | 20,118 | Microsoft Azure | MIT |
| 335 | guanaco-33bUW | 1,126 | +/-12 | 2,921 | UW | Non-commercial |
| 336 | zephyr-7b-alphaHuggingFace | 1,126 | +/-16 | 1,785 | HuggingFace | MIT |
| 337 | stripedhyena-nous-7bTogether AI | 1,120 | +/-11 | 5,182 | Together AI | Apache 2.0 |
| 338 | CodeLlama-70B-InstructFacebook AI研究实验室 | 1,118 | +/-18 | 1,143 | Facebook AI研究实验室 | Llama 2 Community |
| 339 | Gemma 1.1-2B-ITGoogle Research | 1,114 | +/-8 | 10,854 | Google Research | Gemma license |
| 340 | Vicuna 7BLM-SYS | 1,114 | +/-9 | 6,923 | LM-SYS | Llama 2 Community |
| 341 | smollm2-1.7b-instructHuggingFace | 1,113 | +/-14 | 2,199 | HuggingFace | Apache 2.0 |
| 342 | llama-3.2-1b-instructMeta | 1,110 | +/-8 | 8,045 | Meta | Llama 3.2 |
| 343 | Mistral 7B InstructMistralAI | 1,109 | +/-9 | 8,977 | MistralAI | Apache 2.0 |
| 344 | Baichuan2-7B-Chat百川智能 | 1,107 | +/-7 | 14,148 | 百川智能 | Llama 2 Community |
| 345 | Gemma 2B - ItGoogle Research | 1,092 | +/-12 | 4,780 | Google Research | Gemma license |
| 346 | Qwen1.5-4B-Chat阿里巴巴 | 1,089 | +/-9 | 7,597 | 阿里巴巴 | Qianwen LICENSE |
| 347 | olmo-7b-instructAi2 | 1,073 | +/-11 | 6,328 | Ai2 | Apache-2.0 |
| 348 | Koala达摩院 | 1,069 | +/-10 | 6,965 | 达摩院 | Non-commercial |
| 349 | alpaca-13bStanford | 1,067 | +/-11 | 5,745 | Stanford | Non-commercial |
| 350 | GPT4All 13BNomic AI | 1,065 | +/-15 | 1,743 | Nomic AI | Non-commercial |
| 351 | MPT-7B-ChatMosaicML | 1,061 | +/-12 | 3,924 | MosaicML | CC-BY-NC-SA-4.0 |
| 352 | ChatGLM3-6B智谱AI | 1,055 | +/-12 | 4,658 | 智谱AI | Apache-2.0 |
| 353 | RWKV-4-Raven-14BRWKV | 1,040 | +/-11 | 4,845 | RWKV | Apache 2.0 |
| 354 | ChatGLM2-6B智谱AI | 1,023 | +/-14 | 2,658 | 智谱AI | Apache-2.0 |
| 355 | oasst-pythia-12bOpenAssistant | 1,021 | +/-11 | 6,310 | OpenAssistant | Apache 2.0 |
| 356 | ChatGLM-6B智谱AI | 994 | +/-13 | 4,914 | 智谱AI | Non-commercial |
| 357 | fastchat-t5-3bLMSYS | 990 | +/-12 | 4,203 | LMSYS | Apache 2.0 |
| 358 | dolly-v2-12bDatabricks | 979 | +/-14 | 3,412 | Databricks | MIT |
| 359 | LLaMA 13BFacebook AI研究实验室 | 972 | +/-16 | 2,391 | Facebook AI研究实验室 | Non-commercial |
| 360 | stablelm-tuned-alpha-7bStability AI | 952 | +/-13 | 3,287 | Stability AI | CC-BY-NC-SA-4.0 |
数据仅供参考,以官方来源为准。模型名称旁的链接可跳转到 DataLearner 模型详情页。
常见问题 (FAQ)
什么是 Text Generation Arena (LMArena)?
Text Generation Arena(原 LMSYS Chatbot Arena)是目前最具影响力的大模型匿名评测平台。用户向两个身份未知的模型提问,根据回答质量投票,系统通过 Elo 算法将数百万次投票汇聚为动态排行榜,被学术界和工业界广泛引用。
Arena Elo 分数是如何计算的?
Elo 算法源自国际象棋评分体系。每次对战后,胜者得分上升、败者下降,幅度取决于双方原始评分差距。95% 置信区间(CI)反映该模型参与对战次数的多少:CI 越窄说明数据越充分、排名越可信。
为什么同一模型会出现"Thinking"和普通两个版本?
部分模型支持"扩展思考"(Extended Thinking)模式,会在给出最终答案前进行更深入的内部推理。该模式通常在逻辑推理、数学和编程任务上得分更高,但响应时延也更长、成本更高。Arena 将两种模式分开评测,以便用户根据实际需求选择。
如何根据排行榜选择适合自己的大语言模型?
建议综合考虑:综合性能(看 Elo 总分)、成本(闭源 API 按量计费,开源可自部署)、中文支持、开源程度以及响应速度。















