Text Generation Arena Leaderboard
The latest AI text generation leaderboard based on LMArena anonymous user voting. Covers Elo scores, confidence intervals, and vote counts for leading language models.
Top Model
Claude Fable 5
Top Score
1,508
Model Count
367
Data version
2026年06月16日
Data source: LM Arena
About This Leaderboard
This leaderboard ranks the strongest AI models for text generation. Data comes from LMArena (formerly LMSYS Chatbot Arena), the world's largest crowdsourced AI evaluation platform. Users chat with two anonymous models side-by-side and vote for the better response — rankings are determined entirely by real user preferences, not lab benchmarks.
Methodology Overview
Blind testing: Users chat with two anonymous models and vote based on response quality, eliminating brand bias.
Elo scoring: Using the Bradley-Terry model (adapted from chess Elo ratings) to calculate each model's strength score from battle outcomes. Higher scores mean users more frequently prefer that model.
Broad scenario coverage: Testing spans coding, creative writing, math reasoning, Q&A, role-playing, and more.
DataLearner provides in-depth analysis on top of the raw data, linking leaderboard models to the DataLearner model database so you can quickly access model details, API pricing, benchmark scores, and more.
Ranking Table
| Rank | Model | Score | 95% CI | Votes | Organization | License |
|---|---|---|---|---|---|---|
Claude Fable 5Anthropic | 1,508 | +/-9 | 4,297 | Anthropic | Proprietary | |
Claude Opus 4.6 (thinking)Anthropic | 1,504 | +/-4 | 46,410 | Anthropic | Proprietary | |
Opus 4.7 (thinking)Anthropic | 1,502 | +/-5 | 32,629 | Anthropic | Proprietary | |
| 4 | Claude Opus 4.6Anthropic | 1,499 | +/-4 | 49,596 | Anthropic | Proprietary |
| 5 | Opus 4.7Anthropic | 1,493 | +/-5 | 33,793 | Anthropic | Proprietary |
| 6 | Muse SparkFacebook AI研究实验室 | 1,487 | +/-6 | 13,607 | Facebook AI研究实验室 | Proprietary |
| 7 | Gemini 3.1 Pro PreviewGoogle Deep Mind | 1,486 | +/-4 | 60,640 | Google Deep Mind | Proprietary |
| 8 | Gemini 3.0 Pro (Preview 11-2025)Google Deep Mind | 1,486 | +/-4 | 41,314 | Google Deep Mind | Proprietary |
| 9 | Claude Opus 4.8 (thinking)Anthropic | 1,483 | +/-6 | 12,963 | Anthropic | Proprietary |
| 10 | GPT-5.5 (high)OpenAI | 1,481 | +/-5 | 28,268 | OpenAI | Proprietary |
| 11 | GPT-5.4 (high)OpenAI | 1,478 | +/-4 | 40,959 | OpenAI | Proprietary |
| 12 | Claude Opus 4.8Anthropic | 1,478 | +/-6 | 13,316 | Anthropic | Proprietary |
| 13 | Gemini 3.5 FlashGoogle Deep Mind | 1,476 | +/-7 | 10,171 | Google Deep Mind | Proprietary |
| 14 | GPT-5.2OpenAI | 1,475 | +/-4 | 34,555 | OpenAI | Proprietary |
| 15 | GLM 5.1智谱AI | 1,475 | +/-6 | 16,101 | 智谱AI | MIT |
| 16 | GPT-5.5OpenAI | 1,475 | +/-5 | 29,071 | OpenAI | Proprietary |
| 17 | Qwen3.7-Max-Preview阿里巴巴 | 1,475 | +/-10 | 3,740 | 阿里巴巴 | Proprietary |
| 18 | 1,474 | +/-4 | 42,370 | xAI | Proprietary | |
| 19 | 1,474 | +/-5 | 26,964 | xAI | Proprietary | |
| 20 | Gemini 3.0 FlashGoogle Deep Mind | 1,473 | +/-4 | 30,711 | Google Deep Mind | Proprietary |
| 21 | Claude Opus 4 (thinking-32k)Anthropic | 1,473 | +/-4 | 37,087 | Anthropic | Proprietary |
| 22 | GPT-5.5 InstantOpenAI | 1,473 | +/-5 | 26,254 | OpenAI | Proprietary |
| 23 | 1,472 | +/-4 | 41,384 | xAI | Proprietary | |
| 24 | Claude Sonnet 4.6Anthropic | 1,472 | +/-4 | 39,561 | Anthropic | Proprietary |
| 25 | GLM-5.2 (max)智谱AI | 1,471 | +/-10 | 3,357 | 智谱AI | MIT |
| 26 | Claude Opus 4Anthropic | 1,469 | +/-3 | 71,167 | Anthropic | Proprietary |
| 27 | GPT-5.4OpenAI | 1,468 | +/-4 | 43,382 | OpenAI | Proprietary |
| 28 | ERNIE-5.1-Preview百度 | 1,468 | +/-5 | 25,064 | 百度 | Proprietary |
| 29 | mimo-v2.5-proXiaomi | 1,466 | +/-5 | 26,563 | Xiaomi | MIT |
| 30 | 1,466 | +/-3 | 65,623 | xAI | Proprietary | |
| 31 | Qwen3.5 Max Preview阿里巴巴 | 1,465 | +/-5 | 21,564 | 阿里巴巴 | Proprietary |
| 32 | Qwen3.6-Max-Preview阿里巴巴 | 1,461 | +/-8 | 5,216 | 阿里巴巴 | Proprietary |
| 33 | Gemini 3.0 Flash (minimal)Google Deep Mind | 1,460 | +/-3 | 66,402 | Google Deep Mind | Proprietary |
| 34 | Kimi K2.6Moonshot AI | 1,460 | +/-5 | 25,456 | Moonshot AI | Modified MIT |
| 35 | 1,460 | +/-3 | 67,759 | xAI | Proprietary | |
| 36 | DeepSeek-V4-Pro (thinking)DeepSeek-AI | 1,458 | +/-5 | 26,928 | DeepSeek-AI | MIT |
| 37 | GLM-5智谱AI | 1,457 | +/-5 | 23,246 | 智谱AI | MIT |
| 38 | DeepSeek-V4-ProDeepSeek-AI | 1,456 | +/-5 | 28,720 | DeepSeek-AI | MIT |
| 39 | Claude Sonnet 4.5 (thinking-32k)Anthropic | 1,455 | +/-3 | 82,494 | Anthropic | Proprietary |
| 40 | Claude Sonnet 4.5Anthropic | 1,455 | +/-3 | 80,950 | Anthropic | Proprietary |
| 41 | DOLA Seed 2.0 Pro字节跳动Seed团队 | 1,455 | +/-4 | 50,401 | 字节跳动Seed团队 | Proprietary |
| 42 | GPT-5.1 Pro (high)OpenAI | 1,455 | +/-4 | 40,820 | OpenAI | Proprietary |
| 43 | Gemma 4 31BDeepMind | 1,451 | +/-8 | 5,884 | DeepMind | Apache 2.0 |
| 44 | Kimi K2 ThinkingMoonshot AI | 1,450 | +/-4 | 47,780 | Moonshot AI | Modified MIT |
| 45 | ERNIE 5.0百度 | 1,449 | +/-7 | 9,748 | 百度 | Proprietary |
| 46 | Opus 4.1 (thinking-16k)Anthropic | 1,449 | +/-3 | 49,802 | Anthropic | Proprietary |
| 47 | GPT-5.3OpenAI | 1,449 | +/-4 | 33,125 | OpenAI | Proprietary |
| 48 | mimo-v2-proXiaomi | 1,448 | +/-5 | 24,606 | Xiaomi | Proprietary |
| 49 | minimax-m3MiniMax | 1,448 | +/-7 | 11,264 | MiniMax | Proprietary |
| 50 | GPT-5.4 mini (high)OpenAI | 1,448 | +/-4 | 39,525 | OpenAI | Proprietary |
| 51 | Opus 4.1Anthropic | 1,447 | +/-3 | 77,333 | Anthropic | Proprietary |
| 52 | ERNIE 5.0百度 | 1,447 | +/-4 | 35,299 | 百度 | Proprietary |
| 53 | Gemini 2.5 Pro Experimental 03-25Google Deep Mind | 1,446 | +/-3 | 124,588 | Google Deep Mind | Proprietary |
| 54 | GPT-4.5OpenAI | 1,445 | +/-6 | 14,547 | OpenAI | Proprietary |
| 55 | Qwen 3.6 Plus Preview阿里巴巴 | 1,444 | +/-5 | 28,997 | 阿里巴巴 | Proprietary |
| 56 | 1,444 | +/-5 | 28,229 | xAI | Proprietary | |
| 57 | Qwen3.5-397B-A17B阿里巴巴 | 1,444 | +/-4 | 43,048 | 阿里巴巴 | Apache 2.0 |
| 58 | GPT-4o(2025-03-27)OpenAI | 1,443 | +/-3 | 82,447 | OpenAI | Proprietary |
| 59 | GLM-4.7智谱AI | 1,443 | +/-6 | 12,121 | 智谱AI | MIT |
| 60 | GPT-5.1 InstantOpenAI | 1,439 | +/-4 | 43,457 | OpenAI | Proprietary |
| 61 | Gemma 4 26B A4BDeepMind | 1,438 | +/-8 | 5,813 | DeepMind | Apache 2.0 |
| 62 | GPT-5.2 Pro (high)OpenAI | 1,438 | +/-4 | 48,063 | OpenAI | Proprietary |
| 63 | DeepSeek-V4-Flash (thinking)DeepSeek-AI | 1,436 | +/-5 | 28,215 | DeepSeek-AI | MIT |
| 64 | longcat-flash-chat-2602-expMeituan | 1,436 | +/-5 | 28,187 | Meituan | Proprietary |
| 65 | Qwen3 Max (Preview)阿里巴巴 | 1,435 | +/-5 | 27,716 | 阿里巴巴 | Proprietary |
| 66 | GPT-5.2OpenAI | 1,435 | +/-3 | 59,625 | OpenAI | Proprietary |
| 67 | DeepSeek-V4-FlashDeepSeek-AI | 1,434 | +/-5 | 28,291 | DeepSeek-AI | MIT |
| 68 | GPT-5-Pro (high)OpenAI | 1,434 | +/-5 | 31,928 | OpenAI | Proprietary |
| 69 | mimo-v2.5Xiaomi | 1,433 | +/-5 | 27,111 | Xiaomi | MIT |
| 70 | gemini-3.1-flash-lite-previewGoogle | 1,432 | +/-4 | 48,525 | Proprietary | |
| 71 | mimo-v2-omniXiaomi | 1,432 | +/-6 | 12,528 | Xiaomi | Proprietary |
| 72 | Kimi K2.5 InstantMoonshot AI | 1,431 | +/-7 | 8,177 | Moonshot AI | Modified MIT |
| 73 | OpenAI o3OpenAI | 1,431 | +/-4 | 59,744 | OpenAI | Proprietary |
| 74 | 1,431 | +/-3 | 56,873 | xAI | Proprietary | |
| 75 | Kimi K2 Thinking (thinking-turbo)Moonshot AI | 1,430 | +/-3 | 62,098 | Moonshot AI | Modified MIT |
| 76 | amazon-nova-experimental-chat-26-02-10Amazon | 1,427 | +/-10 | 3,417 | Amazon | Proprietary |
| 77 | GPT-5OpenAI | 1,427 | +/-4 | 31,569 | OpenAI | Proprietary |
| 78 | mistral-medium-3.5Mistral | 1,426 | +/-7 | 10,739 | Mistral | Modified MIT |
| 79 | GLM-4.6智谱AI | 1,425 | +/-4 | 35,640 | 智谱AI | MIT |
| 80 | DeepSeek V3.2DeepSeek-AI | 1,425 | +/-4 | 47,303 | DeepSeek-AI | MIT |
| 81 | DeepSeek V3.2-Exp (thinking)DeepSeek-AI | 1,425 | +/-7 | 9,069 | DeepSeek-AI | MIT |
| 82 | Claude Opus 4 (thinking-16k)Anthropic | 1,424 | +/-4 | 36,887 | Anthropic | Proprietary |
| 83 | qwen3-max-2025-09-23Alibaba | 1,424 | +/-6 | 9,151 | Alibaba | Proprietary |
| 84 | Qwen3-235B-A22B-2507阿里巴巴 | 1,423 | +/-3 | 97,241 | 阿里巴巴 | Apache 2.0 |
| 85 | DeepSeek V3.2-ExpDeepSeek-AI | 1,423 | +/-6 | 11,922 | DeepSeek-AI | MIT |
| 86 | DeepSeek V3.2 (thinking)DeepSeek-AI | 1,423 | +/-4 | 41,085 | DeepSeek-AI | MIT |
| 87 | DeepSeek-R1-0528DeepSeek-AI | 1,422 | +/-6 | 18,463 | DeepSeek-AI | MIT |
| 88 | 1,421 | +/-8 | 6,809 | xAI | Proprietary | |
| 89 | ERNIE 5.0百度 | 1,419 | +/-9 | 4,705 | 百度 | Proprietary |
| 90 | Kimi K2 0905Moonshot AI | 1,418 | +/-7 | 11,780 | Moonshot AI | Modified MIT |
| 91 | DeepSeek-V3.1 Terminus (thinking)DeepSeek-AI | 1,418 | +/-10 | 3,462 | DeepSeek-AI | MIT |
| 92 | Kimi K2Moonshot AI | 1,417 | +/-5 | 27,637 | Moonshot AI | Modified MIT |
| 93 | DeepSeek-V3.1DeepSeek-AI | 1,417 | +/-6 | 14,958 | DeepSeek-AI | MIT |
| 94 | Qwen3.5-122B-A10B阿里巴巴 | 1,417 | +/-4 | 28,575 | 阿里巴巴 | Apache 2.0 |
| 95 | DeepSeek-V3.1 (thinking)DeepSeek-AI | 1,417 | +/-7 | 11,737 | DeepSeek-AI | MIT |
| 96 | 1,417 | +/-4 | 34,620 | MiniMaxAI | Modified MIT | |
| 97 | nvidia-nemotron-3-ultra-550b-a55b-nvfp4Nvidia | 1,416 | +/-8 | 6,153 | Nvidia | OpenMDW-1.1 |
| 98 | DeepSeek-V3.1 TerminusDeepSeek-AI | 1,416 | +/-10 | 3,702 | DeepSeek-AI | MIT |
| 99 | amazon-nova-experimental-chat-26-01-10Amazon | 1,416 | +/-10 | 3,406 | Amazon | Proprietary |
| 100 | Mistral Large 3MistralAI | 1,416 | +/-4 | 44,094 | MistralAI | Apache 2.0 |
| 101 | Qwen3-VL-235B-A22B-Instruct阿里巴巴 | 1,415 | +/-6 | 11,512 | 阿里巴巴 | Apache 2.0 |
| 102 | GPT-4.1OpenAI | 1,414 | +/-4 | 50,981 | OpenAI | Proprietary |
| 103 | hunyuan-hy3-previewTencent | 1,413 | +/-8 | 6,678 | Tencent | tencent-hunyuan-community |
| 104 | Claude Opus 4Anthropic | 1,412 | +/-4 | 44,208 | Anthropic | Proprietary |
| 105 | 1,412 | +/-4 | 32,905 | xAI | Proprietary | |
| 106 | Haiku 4.5Anthropic | 1,411 | +/-3 | 91,153 | Anthropic | Proprietary |
| 107 | GLM-4.5智谱AI | 1,411 | +/-5 | 24,310 | 智谱AI | MIT |
| 108 | Gemini 2.5 FlashGoogle Deep Mind | 1,410 | +/-2 | 124,544 | Google Deep Mind | Proprietary |
| 109 | 1,410 | +/-4 | 41,385 | xAI | Proprietary | |
| 110 | Magistral-Medium-2506MistralAI | 1,410 | +/-3 | 94,036 | MistralAI | Proprietary |
| 111 | Qwen3.5-27B阿里巴巴 | 1,409 | +/-4 | 27,421 | 阿里巴巴 | Apache 2.0 |
| 112 | Gemini 2.5 Flash-Preview-09-2025Google Deep Mind | 1,404 | +/-4 | 32,910 | Google Deep Mind | Proprietary |
| 113 | 1,404 | +/-5 | 18,710 | xAI | Proprietary | |
| 114 | qwen3-235b-a22b-no-thinkingAlibaba | 1,403 | +/-5 | 38,208 | Alibaba | Apache 2.0 |
| 115 | GPT-5.4 nano (high)OpenAI | 1,403 | +/-4 | 38,610 | OpenAI | Proprietary |
| 116 | OpenAI o1OpenAI | 1,402 | +/-4 | 27,807 | OpenAI | Proprietary |
| 117 | Qwen3-Next阿里巴巴 | 1,402 | +/-5 | 22,873 | 阿里巴巴 | Apache 2.0 |
| 118 | longcat-flash-chatMeituan | 1,401 | +/-6 | 11,401 | Meituan | MIT |
| 119 | qwen3-235b-a22b-thinking-2507Alibaba | 1,399 | +/-7 | 8,994 | Alibaba | Apache 2.0 |
| 120 | Claude Sonnet 4 (thinking-32k)Anthropic | 1,399 | +/-4 | 35,108 | Anthropic | Proprietary |
| 121 | DeepSeek-R1DeepSeek-AI | 1,398 | +/-5 | 18,524 | DeepSeek-AI | MIT |
| 122 | Step 3.5 FlashStepFunAI | 1,397 | +/-4 | 40,958 | StepFunAI | Proprietary |
| 123 | hunyuan-vision-1.5-thinkingTencent | 1,396 | +/-12 | 2,216 | Tencent | Proprietary |
| 124 | Qwen3.5-35B-A3B阿里巴巴 | 1,396 | +/-4 | 29,248 | 阿里巴巴 | Apache 2.0 |
| 125 | Qwen3-VL-235B-A22B-Instruct (thinking)阿里巴巴 | 1,396 | +/-7 | 7,944 | 阿里巴巴 | Apache 2.0 |
| 126 | DeepSeek-V3-0324DeepSeek-AI | 1,396 | +/-4 | 45,505 | DeepSeek-AI | MIT |
| 127 | Step 3.5 FlashStepFunAI | 1,395 | +/-4 | 44,826 | StepFunAI | Apache 2.0 |
| 128 | amazon-nova-experimental-chat-12-10Amazon | 1,395 | +/-10 | 3,680 | Amazon | Proprietary |
| 129 | mimo-v2-flash (non-thinking)Xiaomi | 1,393 | +/-4 | 46,705 | Xiaomi | MIT |
| 130 | 1,391 | +/-4 | 41,271 | MiniMaxAI | Modified MIT | |
| 131 | GPT-5-mini (high)OpenAI | 1,390 | +/-5 | 27,021 | OpenAI | Proprietary |
| 132 | OpenAI o4 - miniOpenAI | 1,390 | +/-4 | 45,439 | OpenAI | Proprietary |
| 133 | Claude Sonnet 4Anthropic | 1,389 | +/-4 | 40,298 | Anthropic | Proprietary |
| 134 | OpenAI o1OpenAI | 1,388 | +/-5 | 31,122 | OpenAI | Proprietary |
| 135 | Qwen3-Coder-480B-A35B阿里巴巴 | 1,388 | +/-5 | 25,729 | 阿里巴巴 | Apache 2.0 |
| 136 | Claude Sonnet 3.7 (thinking-32k)Anthropic | 1,387 | +/-4 | 38,819 | Anthropic | Proprietary |
| 137 | Hunyuan-T1腾讯AI实验室 | 1,387 | +/-9 | 4,704 | 腾讯AI实验室 | Proprietary |
| 138 | mimo-v2-flash (thinking)Xiaomi | 1,387 | +/-6 | 10,956 | Xiaomi | MIT |
| 139 | mistral-medium-2505Mistral | 1,387 | +/-5 | 33,224 | Mistral | Proprietary |
| 140 | 1,384 | +/-5 | 17,128 | MiniMaxAI | MIT | |
| 141 | Qwen3-30B-A3B-2507阿里巴巴 | 1,383 | +/-5 | 23,728 | 阿里巴巴 | Apache 2.0 |
| 142 | GPT-4.1 miniOpenAI | 1,383 | +/-4 | 39,329 | OpenAI | Proprietary |
| 143 | hunyuan-turbos-20250416Tencent | 1,382 | +/-6 | 10,722 | Tencent | Proprietary |
| 144 | Gemini 2.5 Flash-Lite-Preview-09-2025 (no-thinking)Google Deep Mind | 1,380 | +/-3 | 47,228 | Google Deep Mind | Proprietary |
| 145 | trinity-large-previewArcee AI | 1,379 | +/-4 | 30,145 | Arcee AI | Apache 2.0 |
| 146 | GLM-4.6V智谱AI | 1,377 | +/-11 | 2,805 | 智谱AI | MIT |
| 147 | Qwen3-235B-A22B阿里巴巴 | 1,375 | +/-5 | 26,267 | 阿里巴巴 | Apache 2.0 |
| 148 | Gemini 2.5 Flash-Lite (thinking)Google Deep Mind | 1,374 | +/-5 | 32,899 | Google Deep Mind | Proprietary |
| 149 | Qwen2.5-Max阿里巴巴 | 1,374 | +/-4 | 32,619 | 阿里巴巴 | Proprietary |
| 150 | GLM-4.5-Air智谱AI | 1,373 | +/-4 | 31,077 | 智谱AI | MIT |
| 151 | Claude 3.5 SonnetAnthropic | 1,373 | +/-3 | 88,337 | Anthropic | Proprietary |
| 152 | Claude Sonnet 3.7Anthropic | 1,371 | +/-4 | 43,185 | Anthropic | Proprietary |
| 153 | Qwen3-Next (thinking)阿里巴巴 | 1,370 | +/-6 | 13,693 | 阿里巴巴 | Apache 2.0 |
| 154 | trinity-large-thinkingArcee AI | 1,369 | +/-5 | 29,305 | Arcee AI | Apache 2.0 |
| 155 | GLM-4.7-Flash智谱AI | 1,368 | +/-6 | 11,731 | 智谱AI | MIT |
| 156 | amazon-nova-experimental-chat-11-10Amazon | 1,367 | +/-4 | 25,383 | Amazon | Proprietary |
| 157 | Gemma 3 - 27B (IT)Google Deep Mind | 1,366 | +/-4 | 47,529 | Google Deep Mind | Gemma |
| 158 | minimax-m1MiniMax | 1,364 | +/-4 | 35,208 | MiniMax | Apache 2.0 |
| 159 | OpenAI o3-mini (high)OpenAI | 1,363 | +/-5 | 18,589 | OpenAI | Proprietary |
| 160 | OpenAI o3-mini (high)OpenAI | 1,362 | +/-5 | 16,962 | OpenAI | Proprietary |
| 161 | nvidia-nemotron-3-super-120b-a12bNvidia | 1,362 | +/-7 | 7,544 | Nvidia | NVIDIA Open Model |
| 162 | Gemini 2.0 Flash ExperimentalDeepMind | 1,360 | +/-4 | 43,748 | DeepMind | Proprietary |
| 163 | DeepSeek-V3DeepSeek-AI | 1,358 | +/-5 | 21,770 | DeepSeek-AI | DeepSeek |
| 164 | Mistral-Small-3.2MistralAI | 1,358 | +/-5 | 17,708 | MistralAI | Apache 2.0 |
| 165 | 1,357 | +/-5 | 22,715 | xAI | Proprietary | |
| 166 | intellect-3Prime Intellect | 1,357 | +/-8 | 5,331 | Prime Intellect | MIT |
| 167 | C4AI Command A (202503)CohereAI | 1,354 | +/-3 | 56,266 | CohereAI | CC-BY-NC-4.0 |
| 168 | Gemini 2.0 Flash-LiteDeepMind | 1,353 | +/-4 | 24,955 | DeepMind | Proprietary |
| 169 | GLM-4.5V智谱AI | 1,353 | +/-8 | 4,959 | 智谱AI | MIT |
| 170 | GPT OSS 120BOpenAI | 1,353 | +/-4 | 30,635 | OpenAI | Apache 2.0 |
| 171 | Gemini 1.5 ProGoogle Deep Mind | 1,351 | +/-3 | 55,606 | Google Deep Mind | Proprietary |
| 172 | amazon-nova-experimental-chat-10-20Amazon | 1,350 | +/-6 | 11,470 | Amazon | Proprietary |
| 173 | hunyuan-turbos-20250226Tencent | 1,349 | +/-12 | 2,220 | Tencent | Proprietary |
| 174 | Step3StepFunAI | 1,348 | +/-7 | 6,541 | StepFunAI | Apache 2.0 |
| 175 | amazon-nova-experimental-chat-10-09Amazon | 1,348 | +/-11 | 2,838 | Amazon | Proprietary |
| 176 | OpenAI o3-miniOpenAI | 1,348 | +/-4 | 57,336 | OpenAI | Proprietary |
| 177 | llama-3.1-nemotron-ultra-253b-v1Nvidia | 1,347 | +/-12 | 2,549 | Nvidia | Nvidia Open Model |
| 178 | Qwen3-32B阿里巴巴 | 1,347 | +/-9 | 3,926 | 阿里巴巴 | Apache 2.0 |
| 179 | mercury-2Inception AI | 1,346 | +/-11 | 3,124 | Inception AI | Proprietary |
| 180 | qwen-plus-0125Alibaba | 1,346 | +/-8 | 5,819 | Alibaba | Proprietary |
| 181 | ling-flash-2.0InclusionAI | 1,346 | +/-7 | 7,006 | InclusionAI | MIT |
| 182 | 1,346 | +/-8 | 6,868 | MiniMaxAI | Apache 2.0 | |
| 183 | GPT-4oOpenAI | 1,346 | +/-3 | 112,881 | OpenAI | Proprietary |
| 184 | nvidia-llama-3.3-nemotron-super-49b-v1.5Nvidia | 1,343 | +/-10 | 3,345 | Nvidia | Nvidia Open |
| 185 | glm-4-plus-0111Zhipu | 1,343 | +/-8 | 5,760 | Zhipu | Proprietary |
| 186 | Claude 3.5 SonnetAnthropic | 1,342 | +/-3 | 82,419 | Anthropic | Proprietary |
| 187 | Gemma 3 - 12B (IT)Google Deep Mind | 1,342 | +/-10 | 3,829 | Google Deep Mind | Gemma |
| 188 | hunyuan-turbo-0110Tencent | 1,341 | +/-12 | 2,290 | Tencent | Proprietary |
| 189 | Nova 2 Lite亚马逊 | 1,337 | +/-6 | 12,233 | 亚马逊 | Proprietary |
| 190 | GPT-5-Nano (high)OpenAI | 1,337 | +/-7 | 8,266 | OpenAI | Proprietary |
| 191 | OpenAI o1-miniOpenAI | 1,337 | +/-4 | 51,981 | OpenAI | Proprietary |
| 192 | QwQ-32B阿里巴巴 | 1,336 | +/-4 | 25,393 | 阿里巴巴 | Apache 2.0 |
| 193 | 1,336 | +/-4 | 63,498 | xAI | Proprietary | |
| 194 | gemini-advanced-0514Google | 1,335 | +/-5 | 50,148 | Proprietary | |
| 195 | GPT-4oOpenAI | 1,335 | +/-4 | 45,499 | OpenAI | Proprietary |
| 196 | llama-3.1-405b-instruct-bf16Meta | 1,335 | +/-4 | 41,375 | Meta | Llama 3.1 Community |
| 197 | step-2-16k-exp-202412StepFun | 1,334 | +/-9 | 4,833 | StepFun | Proprietary |
| 198 | llama-3.1-405b-instruct-fp8Meta | 1,333 | +/-4 | 59,656 | Meta | Llama 3.1 Community |
| 199 | olmo-3.1-32b-instructAi2 | 1,330 | +/-6 | 12,220 | Ai2 | Apache 2.0 |
| 200 | yi-lightning01 AI | 1,328 | +/-5 | 27,332 | 01 AI | Proprietary |
| 201 | llama-3.3-nemotron-49b-super-v1Nvidia | 1,328 | +/-12 | 2,218 | Nvidia | Nvidia |
| 202 | molmo-2-8bAi2 | 1,327 | +/-21 | 804 | Ai2 | Apache 2.0 |
| 203 | Qwen3-30B-A3B阿里巴巴 | 1,327 | +/-5 | 26,486 | 阿里巴巴 | Apache 2.0 |
| 204 | Llama 4 Maverick InstructFacebook AI研究实验室 | 1,327 | +/-4 | 39,982 | Facebook AI研究实验室 | Llama 4 |
| 205 | hunyuan-large-2025-02-10Tencent | 1,326 | +/-10 | 3,738 | Tencent | Proprietary |
| 206 | gpt-4-turbo-2024-04-09OpenAI | 1,324 | +/-4 | 98,114 | OpenAI | Proprietary |
| 207 | Claude 3.5 HaikuAnthropic | 1,324 | +/-3 | 69,979 | Anthropic | Proprietary |
| 208 | Gemini 1.5 ProGoogle Deep Mind | 1,323 | +/-4 | 79,138 | Google Deep Mind | Proprietary |
| 209 | deepseek-v2.5-1210DeepSeek | 1,323 | +/-8 | 6,795 | DeepSeek | DeepSeek |
| 210 | Llama 4 Scout InstructFacebook AI研究实验室 | 1,323 | +/-5 | 30,286 | Facebook AI研究实验室 | Llama |
| 211 | GPT-4.1 nanoOpenAI | 1,322 | +/-8 | 6,103 | OpenAI | Proprietary |
| 212 | Claude3-OpusAnthropic | 1,321 | +/-3 | 194,909 | Anthropic | Proprietary |
| 213 | ring-flash-2.0InclusionAI | 1,321 | +/-7 | 7,148 | InclusionAI | MIT |
| 214 | step-1o-turbo-202506StepFun | 1,320 | +/-7 | 9,041 | StepFun | Proprietary |
| 215 | glm-4-plusZhipu AI | 1,319 | +/-5 | 26,126 | Zhipu AI | Proprietary |
| 216 | Gemma-3n-E4BGoogle Deep Mind | 1,318 | +/-5 | 22,594 | Google Deep Mind | Gemma |
| 217 | Llama3.3-70B-InstructFacebook AI研究实验室 | 1,318 | +/-3 | 54,734 | Facebook AI研究实验室 | Llama-3.3 |
| 218 | qwen-max-0919Alibaba | 1,318 | +/-6 | 16,478 | Alibaba | Qwen |
| 219 | GPT-4o miniOpenAI | 1,318 | +/-4 | 68,709 | OpenAI | Proprietary |
| 220 | GPT OSS 20BOpenAI | 1,318 | +/-6 | 10,627 | OpenAI | Apache 2.0 |
| 221 | nvidia-nemotron-3-nano-30b-a3b-bf16Nvidia | 1,316 | +/-6 | 15,506 | Nvidia | NVIDIA Open Model |
| 222 | qwen2.5-plus-1127Alibaba | 1,315 | +/-6 | 10,187 | Alibaba | Proprietary |
| 223 | athene-v2-chatNexusFlow | 1,314 | +/-5 | 24,739 | NexusFlow | NexusFlow |
| 224 | mistral-large-2407Mistral | 1,314 | +/-4 | 45,459 | Mistral | Mistral Research |
| 225 | GPT-4OpenAI | 1,313 | +/-4 | 93,439 | OpenAI | Proprietary |
| 226 | GPT-4OpenAI | 1,312 | +/-4 | 100,105 | OpenAI | Proprietary |
| 227 | hunyuan-standard-2025-02-10Tencent | 1,311 | +/-10 | 3,904 | Tencent | Proprietary |
| 228 | gemini-1.5-flash-002Google | 1,309 | +/-4 | 34,902 | Proprietary | |
| 229 | 1,308 | +/-4 | 52,567 | xAI | Proprietary | |
| 230 | DeepSeek V2.5DeepSeek-AI | 1,307 | +/-5 | 24,572 | DeepSeek-AI | DeepSeek |
| 231 | granite-4.1-8bIBM | 1,307 | +/-10 | 4,065 | IBM | Apache 2.0 |
| 232 | athene-70b-0725NexusFlow | 1,306 | +/-6 | 19,621 | NexusFlow | CC-BY-NC-4.0 |
| 233 | mercuryInception AI | 1,306 | +/-14 | 1,953 | Inception AI | Proprietary |
| 234 | olmo-3-32b-thinkAi2 | 1,305 | +/-8 | 5,946 | Ai2 | Apache 2.0 |
| 235 | mistral-large-2411Mistral | 1,305 | +/-4 | 28,073 | Mistral | MRL |
| 236 | Magistral-Medium-2506MistralAI | 1,304 | +/-6 | 11,638 | MistralAI | Proprietary |
| 237 | Mistral-Small-3.1-24B-Instruct-2503MistralAI | 1,303 | +/-5 | 33,216 | MistralAI | Apache 2.0 |
| 238 | Gemma 3 - 4B (IT)Google Deep Mind | 1,303 | +/-9 | 4,171 | Google Deep Mind | Gemma |
| 239 | Qwen2.5-VL-72B-Instruct阿里巴巴 | 1,303 | +/-4 | 39,406 | 阿里巴巴 | Qwen |
| 240 | Llama3.1-70B-InstructFacebook AI研究实验室 | 1,299 | +/-8 | 7,140 | Facebook AI研究实验室 | Llama 3.1 |
| 241 | hunyuan-large-visionTencent | 1,294 | +/-9 | 5,372 | Tencent | Proprietary |
| 242 | Llama3.1-70B-InstructFacebook AI研究实验室 | 1,293 | +/-4 | 55,240 | Facebook AI研究实验室 | Llama 3.1 Community |
| 243 | amazon-nova-pro-v1.0Amazon | 1,290 | +/-5 | 24,745 | Amazon | Proprietary |
| 244 | jamba-1.5-largeAI21 Labs | 1,289 | +/-7 | 8,662 | AI21 Labs | Jamba Open |
| 245 | gemma-2-27b-itGoogle | 1,289 | +/-3 | 75,754 | Gemma license | |
| 246 | reka-core-20240904Reka AI | 1,288 | +/-7 | 7,312 | Reka AI | Proprietary |
| 247 | ibm-granite-h-smallIBM | 1,287 | +/-8 | 5,684 | IBM | Apache 2.0 |
| 248 | GPT-4OpenAI | 1,287 | +/-5 | 54,173 | OpenAI | Proprietary |
| 249 | gemini-1.5-flash-001Google | 1,286 | +/-4 | 62,833 | Proprietary | |
| 250 | llama-3.1-tulu-3-70bAi2 | 1,286 | +/-10 | 2,846 | Ai2 | Llama 3.1 |
| 251 | llama-3.1-nemotron-51b-instructNvidia | 1,286 | +/-10 | 3,749 | Nvidia | Llama 3.1 |
| 252 | olmo-3.1-32b-thinkAi2 | 1,285 | +/-7 | 8,501 | Ai2 | Apache 2.0 |
| 253 | Claude3-SonnetAnthropic | 1,280 | +/-4 | 109,284 | Anthropic | Proprietary |
| 254 | gemma-2-9b-it-simpoPrinceton | 1,280 | +/-7 | 10,072 | Princeton | MIT |
| 255 | nemotron-4-340b-instructNvidia | 1,276 | +/-5 | 19,659 | Nvidia | NVIDIA Open Model |
| 256 | Llama3-70B-InstructFacebook AI研究实验室 | 1,276 | +/-4 | 156,876 | Facebook AI研究实验室 | Llama 3 Community |
| 257 | command-r-plus-08-2024Cohere | 1,276 | +/-7 | 9,866 | Cohere | CC-BY-NC-4.0 |
| 258 | GPT-4OpenAI | 1,275 | +/-4 | 88,723 | OpenAI | Proprietary |
| 259 | Mistral Small 24B Instruct 2501MistralAI | 1,274 | +/-6 | 14,681 | MistralAI | Apache 2.0 |
| 260 | GLM4智谱AI | 1,273 | +/-7 | 9,788 | 智谱AI | Proprietary |
| 261 | reka-flash-20240904Reka AI | 1,272 | +/-7 | 7,536 | Reka AI | Proprietary |
| 262 | Qwen2.5-Coder-32B-Instruct阿里巴巴 | 1,270 | +/-8 | 5,432 | 阿里巴巴 | Apache 2.0 |
| 263 | C4AI Aya Vision 32BCohereAI | 1,267 | +/-5 | 27,124 | CohereAI | CC-BY-NC-4.0 |
| 264 | gemma-2-9b-itGoogle | 1,266 | +/-4 | 54,611 | Gemma license | |
| 265 | deepseek-coder-v2DeepSeek | 1,264 | +/-6 | 15,147 | DeepSeek | DeepSeek License |
| 266 | Qwen2-72B-Instruct阿里巴巴 | 1,261 | +/-5 | 37,325 | 阿里巴巴 | Qianwen LICENSE |
| 267 | C4AI Command R+CohereAI | 1,261 | +/-4 | 77,554 | CohereAI | CC-BY-NC-4.0 |
| 268 | Claude3-HaikuAnthropic | 1,261 | +/-4 | 117,701 | Anthropic | Proprietary |
| 269 | amazon-nova-lite-v1.0Amazon | 1,260 | +/-5 | 19,372 | Amazon | Proprietary |
| 270 | gemini-1.5-flash-8b-001Google | 1,259 | +/-4 | 35,558 | Proprietary | |
| 271 | Phi 4 - 14BMicrosoft Azure | 1,256 | +/-5 | 24,126 | Microsoft Azure | MIT |
| 272 | olmo-2-0325-32b-instructAi2 | 1,251 | +/-11 | 3,334 | Ai2 | Apache-2.0 |
| 273 | command-r-08-2024Cohere | 1,250 | +/-7 | 10,140 | Cohere | CC-BY-NC-4.0 |
| 274 | mistral-large-2402Mistral | 1,242 | +/-5 | 62,436 | Mistral | Proprietary |
| 275 | amazon-nova-micro-v1.0Amazon | 1,241 | +/-5 | 19,364 | Amazon | Proprietary |
| 276 | jamba-1.5-miniAI21 Labs | 1,239 | +/-7 | 8,858 | AI21 Labs | Jamba Open |
| 277 | ministral-8b-2410Mistral | 1,237 | +/-9 | 4,781 | Mistral | MRL |
| 278 | gemini-pro-dev-apiGoogle | 1,235 | +/-7 | 18,354 | Proprietary | |
| 279 | Qwen1.5-110B-Chat阿里巴巴 | 1,233 | +/-6 | 26,195 | 阿里巴巴 | Qianwen LICENSE |
| 280 | hunyuan-standard-256kTencent | 1,233 | +/-12 | 2,728 | Tencent | Proprietary |
| 281 | reka-flash-21b-20240226-onlineReka AI | 1,233 | +/-7 | 15,450 | Reka AI | Proprietary |
| 282 | Qwen1.5-72B-Chat阿里巴巴 | 1,233 | +/-5 | 39,302 | 阿里巴巴 | Qianwen LICENSE |
| 283 | Mixtral-8x22B-Instruct-v0.1MistralAI | 1,229 | +/-5 | 51,416 | MistralAI | Apache 2.0 |
| 284 | command-rCohere | 1,226 | +/-5 | 54,036 | Cohere | CC-BY-NC-4.0 |
| 285 | reka-flash-21b-20240226Reka AI | 1,226 | +/-6 | 24,806 | Reka AI | Proprietary |
| 286 | gpt-3.5-turbo-0125OpenAI | 1,224 | +/-5 | 66,207 | OpenAI | Proprietary |
| 287 | Llama3-8B-InstructFacebook AI研究实验室 | 1,223 | +/-4 | 104,642 | Facebook AI研究实验室 | Llama 3 Community |
| 288 | C4AI Aya Vision 8BCohereAI | 1,223 | +/-7 | 9,818 | CohereAI | CC-BY-NC-4.0 |
| 289 | Gemini-proDeepMind | 1,222 | +/-12 | 6,390 | DeepMind | Proprietary |
| 290 | mistral-mediumMistral | 1,222 | +/-6 | 34,550 | Mistral | Proprietary |
| 291 | llama-3.1-tulu-3-8bAi2 | 1,220 | +/-11 | 2,896 | Ai2 | Llama 3.1 |
| 292 | Yi-1.5-34B零一万物 | 1,212 | +/-5 | 24,146 | 零一万物 | Apache-2.0 |
| 293 | zephyr-orpo-141b-A35b-v0.1HuggingFace | 1,212 | +/-11 | 4,652 | HuggingFace | Apache 2.0 |
| 294 | Llama3.1-8B-InstructFacebook AI研究实验室 | 1,211 | +/-4 | 49,605 | Facebook AI研究实验室 | Llama 3.1 Community |
| 295 | Llama3.1-8B-InstructFacebook AI研究实验室 | 1,208 | +/-11 | 3,090 | Facebook AI研究实验室 | Apache 2.0 |
| 296 | qwen1.5-32b-chatAlibaba | 1,203 | +/-6 | 21,741 | Alibaba | Qianwen LICENSE |
| 297 | gpt-3.5-turbo-1106OpenAI | 1,202 | +/-9 | 16,619 | OpenAI | Proprietary |
| 298 | gemma-2-2b-itGoogle | 1,200 | +/-4 | 46,616 | Gemma license | |
| 299 | Phi-3-medium 14B-previewMicrosoft Azure | 1,197 | +/-5 | 25,055 | Microsoft Azure | MIT |
| 300 | mixtral-8x7b-instruct-v0.1Mistral | 1,196 | +/-4 | 73,503 | Mistral | Apache 2.0 |
| 301 | DBRX Instructdatabricks | 1,194 | +/-6 | 32,191 | databricks | DBRX LICENSE |
| 302 | InternLM2-Base-20B上海人工智能实验室 | 1,191 | +/-7 | 9,901 | 上海人工智能实验室 | Other |
| 303 | Qwen1.5-14B-Chat阿里巴巴 | 1,190 | +/-7 | 17,839 | 阿里巴巴 | Qianwen LICENSE |
| 304 | WizardLM-70B-V1.0WizardLM Team | 1,184 | +/-9 | 8,214 | WizardLM Team | Llama 2 Community |
| 305 | DeepSeek LLM 67B ChatDeepSeek-AI | 1,184 | +/-11 | 4,932 | DeepSeek-AI | DeepSeek License |
| 306 | Yi-34B零一万物 | 1,183 | +/-7 | 15,483 | 零一万物 | Yi License |
| 307 | granite-3.0-8b-instructIBM | 1,182 | +/-9 | 6,638 | IBM | Apache 2.0 |
| 308 | openchat-3.5OpenChat | 1,182 | +/-10 | 7,968 | OpenChat | Apache-2.0 |
| 309 | openchat-3.5-0106OpenChat | 1,182 | +/-8 | 12,637 | OpenChat | Apache-2.0 |
| 310 | Gemma 1.1-7B-ITGoogle Research | 1,181 | +/-6 | 23,893 | Google Research | Gemma license |
| 311 | snowflake-arctic-instructSnowflake | 1,179 | +/-6 | 32,832 | Snowflake | Apache 2.0 |
| 312 | granite-3.1-2b-instructIBM | 1,178 | +/-11 | 3,188 | IBM | Apache 2.0 |
| 313 | tulu-2-dpo-70bAllenAI/UW | 1,177 | +/-10 | 6,535 | AllenAI/UW | AI2 ImpACT Low-risk |
| 314 | openhermes-2.5-mistral-7bNousResearch | 1,175 | +/-10 | 5,006 | NousResearch | Apache-2.0 |
| 315 | Vicuna 33BLM-SYS | 1,172 | +/-6 | 22,479 | LM-SYS | Non-commercial |
| 316 | starling-lm-7b-betaNexusflow | 1,171 | +/-7 | 16,056 | Nexusflow | Apache-2.0 |
| 317 | Phi-3-small 7BMicrosoft Azure | 1,170 | +/-6 | 17,766 | Microsoft Azure | MIT |
| 318 | llama-2-70b-chatMeta | 1,170 | +/-6 | 38,492 | Meta | Llama 2 Community |
| 319 | starling-lm-7b-alphaUC Berkeley | 1,167 | +/-8 | 10,224 | UC Berkeley | CC-BY-NC-4.0 |
| 320 | llama-3.2-3b-instructMeta | 1,166 | +/-8 | 7,936 | Meta | Llama 3.2 |
| 321 | nous-hermes-2-mixtral-8x7b-dpoNousResearch | 1,164 | +/-12 | 3,777 | NousResearch | Apache-2.0 |
| 322 | Qwen3-VL-2B阿里巴巴 | 1,156 | +/-8 | 6,837 | 阿里巴巴 | Apache 2.0 |
| 323 | QwQ-32B-Preview阿里巴巴 | 1,155 | +/-11 | 3,231 | 阿里巴巴 | Apache 2.0 |
| 324 | llama2-70b-steerlm-chatNvidia | 1,154 | +/-13 | 3,585 | Nvidia | Llama 2 Community |
| 325 | solar-10.7b-instruct-v1.0Upstage AI | 1,151 | +/-13 | 4,155 | Upstage AI | CC-BY-NC-4.0 |
| 326 | dolphin-2.2.1-mistral-7bCognitive Computations | 1,151 | +/-15 | 1,679 | Cognitive Computations | Apache-2.0 |
| 327 | MPT-30B-ChatMosaicML | 1,150 | +/-12 | 2,572 | MosaicML | CC-BY-NC-SA-4.0 |
| 328 | Mistral-7B-Instruct-v0.2MistralAI | 1,149 | +/-7 | 19,402 | MistralAI | Apache-2.0 |
| 329 | wizardlm-13bMicrosoft | 1,148 | +/-9 | 7,044 | Microsoft | Llama 2 Community |
| 330 | falcon-180b-chatTII | 1,147 | +/-17 | 1,295 | TII | Falcon-180B TII License |
| 331 | Qwen1.5-7B-Chat阿里巴巴 | 1,143 | +/-10 | 4,737 | 阿里巴巴 | Qianwen LICENSE |
| 332 | Phi-3-mini 3.8BMicrosoft Azure | 1,142 | +/-6 | 12,297 | Microsoft Azure | MIT |
| 333 | Baichuan2-13B-Chat百川智能 | 1,141 | +/-7 | 19,174 | 百川智能 | Llama 2 Community |
| 334 | Vicuna 13BLM-SYS | 1,140 | +/-7 | 19,367 | LM-SYS | Llama 2 Community |
| 335 | Qwen-14B-Chat阿里巴巴 | 1,138 | +/-11 | 4,964 | 阿里巴巴 | Qianwen LICENSE |
| 336 | PaLM 2Google Research | 1,137 | +/-9 | 8,554 | Google Research | Proprietary |
| 337 | Gemma 7B - ItGoogle Research | 1,137 | +/-9 | 8,925 | Google Research | Gemma license |
| 338 | CodeLLaMA-34BFacebook AI研究实验室 | 1,136 | +/-9 | 7,366 | Facebook AI研究实验室 | Llama 2 Community |
| 339 | zephyr-7b-betaHuggingFace | 1,130 | +/-9 | 11,118 | HuggingFace | MIT |
| 340 | Phi-3-mini 3.8BMicrosoft Azure | 1,129 | +/-7 | 20,685 | Microsoft Azure | MIT |
| 341 | Phi-3-mini 3.8BMicrosoft Azure | 1,127 | +/-6 | 20,118 | Microsoft Azure | MIT |
| 342 | guanaco-33bUW | 1,126 | +/-12 | 2,921 | UW | Non-commercial |
| 343 | zephyr-7b-alphaHuggingFace | 1,126 | +/-16 | 1,785 | HuggingFace | MIT |
| 344 | stripedhyena-nous-7bTogether AI | 1,120 | +/-11 | 5,182 | Together AI | Apache 2.0 |
| 345 | CodeLlama-70B-InstructFacebook AI研究实验室 | 1,118 | +/-18 | 1,143 | Facebook AI研究实验室 | Llama 2 Community |
| 346 | Gemma 1.1-2B-ITGoogle Research | 1,115 | +/-8 | 10,854 | Google Research | Gemma license |
| 347 | Vicuna 7BLM-SYS | 1,114 | +/-9 | 6,923 | LM-SYS | Llama 2 Community |
| 348 | smollm2-1.7b-instructHuggingFace | 1,114 | +/-14 | 2,199 | HuggingFace | Apache 2.0 |
| 349 | llama-3.2-1b-instructMeta | 1,110 | +/-8 | 8,045 | Meta | Llama 3.2 |
| 350 | Mistral 7B InstructMistralAI | 1,109 | +/-9 | 8,977 | MistralAI | Apache 2.0 |
| 351 | Baichuan2-7B-Chat百川智能 | 1,107 | +/-7 | 14,148 | 百川智能 | Llama 2 Community |
| 352 | Gemma 2B - ItGoogle Research | 1,092 | +/-11 | 4,780 | Google Research | Gemma license |
| 353 | Qwen1.5-4B-Chat阿里巴巴 | 1,090 | +/-9 | 7,597 | 阿里巴巴 | Qianwen LICENSE |
| 354 | olmo-7b-instructAi2 | 1,073 | +/-11 | 6,328 | Ai2 | Apache-2.0 |
| 355 | Koala达摩院 | 1,070 | +/-10 | 6,965 | 达摩院 | Non-commercial |
| 356 | alpaca-13bStanford | 1,068 | +/-11 | 5,745 | Stanford | Non-commercial |
| 357 | GPT4All 13BNomic AI | 1,066 | +/-15 | 1,743 | Nomic AI | Non-commercial |
| 358 | MPT-7B-ChatMosaicML | 1,062 | +/-12 | 3,924 | MosaicML | CC-BY-NC-SA-4.0 |
| 359 | ChatGLM3-6B智谱AI | 1,055 | +/-12 | 4,658 | 智谱AI | Apache-2.0 |
| 360 | RWKV-4-Raven-14BRWKV | 1,041 | +/-11 | 4,845 | RWKV | Apache 2.0 |
| 361 | ChatGLM2-6B智谱AI | 1,024 | +/-14 | 2,658 | 智谱AI | Apache-2.0 |
| 362 | oasst-pythia-12bOpenAssistant | 1,022 | +/-11 | 6,310 | OpenAssistant | Apache 2.0 |
| 363 | ChatGLM-6B智谱AI | 995 | +/-13 | 4,914 | 智谱AI | Non-commercial |
| 364 | fastchat-t5-3bLMSYS | 991 | +/-12 | 4,203 | LMSYS | Apache 2.0 |
| 365 | dolly-v2-12bDatabricks | 980 | +/-14 | 3,412 | Databricks | MIT |
| 366 | LLaMA 13BFacebook AI研究实验室 | 973 | +/-16 | 2,391 | Facebook AI研究实验室 | Non-commercial |
| 367 | stablelm-tuned-alpha-7bStability AI | 952 | +/-13 | 3,287 | Stability AI | CC-BY-NC-SA-4.0 |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.
FAQ
What is Text Generation Arena (LMArena)?
Text Generation Arena, formerly LMSYS Chatbot Arena, is one of the most widely followed anonymous LLM evaluation platforms. Users compare answers from two hidden models and vote for the better response; Elo-style scoring aggregates those votes into a dynamic leaderboard.
How is the Arena Elo score calculated?
Arena Elo is adapted from chess rating systems. After each head-to-head comparison, the preferred model gains rating points and the other model loses points, with the size of the change depending on the rating gap. The 95% confidence interval reflects how much comparison data supports the estimate.
Why do some models have both Thinking and regular versions?
Some models offer an extended-thinking mode that spends more inference time reasoning before producing the final answer. This can improve scores on reasoning, math, and coding tasks, but usually increases latency and cost, so Arena tracks these variants separately.
How should I choose an LLM from this leaderboard?
Consider overall Elo, cost, language coverage, open-source availability, and latency. The top-ranked model is not always the best fit for every workflow.















