LMArena Math Arena Leaderboard
The latest AI math reasoning leaderboard based on LMArena Math Arena anonymous user voting. Covers Elo scores, confidence intervals, and vote counts for Claude, GPT, Gemini, DeepSeek, Qwen, and more.
Top Model
Claude Fable 5
Top Score
1517.00
Model Count
356
Data version
2026年06月16日
Data source: LM Arena
About This Leaderboard
This leaderboard ranks AI models by mathematical reasoning ability. Data comes from LMArena's Math sub-track, evaluated through anonymous blind testing by real users on math problem-solving tasks.
Methodology Overview
Blind testing: Users submit math problems, two anonymous models provide solutions, and users vote for the better answer — eliminating brand bias.
Elo scoring: Uses the Bradley-Terry model to calculate Elo scores. Higher scores mean users more frequently prefer that model's math solutions.
Broad scenario coverage: Testing spans algebra, geometry, calculus, competition math, and more diverse real-world math tasks.
DataLearner provides in-depth analysis on top of the raw data, linking leaderboard models to the DataLearner model database so you can quickly access model details, API pricing, benchmark scores, and more.
Ranking Table
| Rank | Model | Score | 95% CI | Votes | Organization | License |
|---|---|---|---|---|---|---|
Claude Fable 5Anthropic | 1517.00 | +/-37 | 244 | Anthropic | Proprietary | |
Gemini 3.5 FlashGoogle Deep Mind | 1516.00 | +/-25 | 584 | Google Deep Mind | Proprietary | |
Claude Opus 4.6 (thinking)Anthropic | 1516.00 | +/-12 | 2,502 | Anthropic | Proprietary | |
| 4 | Claude Opus 4.6Anthropic | 1504.00 | +/-12 | 2,867 | Anthropic | Proprietary |
| 5 | Opus 4.7 (thinking)Anthropic | 1504.00 | +/-14 | 1,779 | Anthropic | Proprietary |
| 6 | GPT-5.4 (high)OpenAI | 1503.00 | +/-13 | 2,285 | OpenAI | Proprietary |
| 7 | Opus 4.7Anthropic | 1498.00 | +/-14 | 1,836 | Anthropic | Proprietary |
| 8 | Claude Opus 4.8 (thinking)Anthropic | 1496.00 | +/-22 | 648 | Anthropic | Proprietary |
| 9 | Claude Opus 4.8Anthropic | 1495.00 | +/-23 | 648 | Anthropic | Proprietary |
| 10 | Gemini 3.1 Pro PreviewGoogle Deep Mind | 1495.00 | +/-11 | 3,429 | Google Deep Mind | Proprietary |
| 11 | GPT-5.5 (high)OpenAI | 1494.00 | +/-15 | 1,569 | OpenAI | Proprietary |
| 12 | Qwen3.7-Max-Preview阿里巴巴 | 1492.00 | +/-40 | 219 | 阿里巴巴 | Proprietary |
| 13 | GPT-5.5OpenAI | 1490.00 | +/-15 | 1,574 | OpenAI | Proprietary |
| 14 | mimo-v2.5-proXiaomi | 1485.00 | +/-16 | 1,384 | Xiaomi | MIT |
| 15 | Kimi K2.6Moonshot AI | 1483.00 | +/-16 | 1,372 | Moonshot AI | Modified MIT |
| 16 | ERNIE-5.1-Preview百度 | 1481.00 | +/-16 | 1,346 | 百度 | Proprietary |
| 17 | Gemini 3.0 Pro (Preview 11-2025)Google Deep Mind | 1478.00 | +/-11 | 2,652 | Google Deep Mind | Proprietary |
| 18 | DeepSeek-V4-Pro (thinking)DeepSeek-AI | 1477.00 | +/-16 | 1,391 | DeepSeek-AI | MIT |
| 19 | Qwen3.6-Max-Preview阿里巴巴 | 1476.00 | +/-30 | 358 | 阿里巴巴 | Proprietary |
| 20 | Gemini 3.0 FlashGoogle Deep Mind | 1476.00 | +/-13 | 2,002 | Google Deep Mind | Proprietary |
| 21 | GLM 5.1智谱AI | 1474.00 | +/-19 | 966 | 智谱AI | MIT |
| 22 | Kimi K2 ThinkingMoonshot AI | 1472.00 | +/-11 | 2,818 | Moonshot AI | Modified MIT |
| 23 | 1470.00 | +/-13 | 2,399 | xAI | Proprietary | |
| 24 | Claude Opus 4 (thinking-32k)Anthropic | 1470.00 | +/-12 | 2,266 | Anthropic | Proprietary |
| 25 | Qwen3.5 Max Preview阿里巴巴 | 1469.00 | +/-16 | 1,352 | 阿里巴巴 | Proprietary |
| 26 | Gemma 4 31BDeepMind | 1469.00 | +/-28 | 399 | DeepMind | Apache 2.0 |
| 27 | Gemma 4 26B A4BDeepMind | 1467.00 | +/-28 | 372 | DeepMind | Apache 2.0 |
| 28 | Claude Opus 4Anthropic | 1466.00 | +/-9 | 4,343 | Anthropic | Proprietary |
| 29 | GPT-5.5 InstantOpenAI | 1463.00 | +/-16 | 1,472 | OpenAI | Proprietary |
| 30 | Muse SparkFacebook AI研究实验室 | 1463.00 | +/-20 | 862 | Facebook AI研究实验室 | Proprietary |
| 31 | minimax-m3MiniMax | 1461.00 | +/-26 | 556 | MiniMax | Proprietary |
| 32 | GPT-5.4OpenAI | 1460.00 | +/-12 | 2,433 | OpenAI | Proprietary |
| 33 | Claude Sonnet 4.6Anthropic | 1459.00 | +/-13 | 2,296 | Anthropic | Proprietary |
| 34 | GPT-5.2 Pro (high)OpenAI | 1458.00 | +/-11 | 2,990 | OpenAI | Proprietary |
| 35 | Claude Sonnet 4.5 (thinking-32k)Anthropic | 1455.00 | +/-9 | 4,913 | Anthropic | Proprietary |
| 36 | Gemini 3.0 Flash (minimal)Google Deep Mind | 1455.00 | +/-10 | 3,814 | Google Deep Mind | Proprietary |
| 37 | GPT-5.1 Pro (high)OpenAI | 1455.00 | +/-12 | 2,500 | OpenAI | Proprietary |
| 38 | GPT-5.2OpenAI | 1453.00 | +/-13 | 2,084 | OpenAI | Proprietary |
| 39 | Qwen 3.6 Plus Preview阿里巴巴 | 1453.00 | +/-14 | 1,720 | 阿里巴巴 | Proprietary |
| 40 | mimo-v2-proXiaomi | 1452.00 | +/-15 | 1,632 | Xiaomi | Proprietary |
| 41 | 1451.00 | +/-13 | 2,367 | xAI | Proprietary | |
| 42 | 1451.00 | +/-15 | 1,609 | xAI | Proprietary | |
| 43 | DOLA Seed 2.0 Pro字节跳动Seed团队 | 1449.00 | +/-11 | 2,913 | 字节跳动Seed团队 | Proprietary |
| 44 | mimo-v2.5Xiaomi | 1448.00 | +/-15 | 1,467 | Xiaomi | MIT |
| 45 | OpenAI o3OpenAI | 1447.00 | +/-10 | 3,728 | OpenAI | Proprietary |
| 46 | Qwen3.5-397B-A17B阿里巴巴 | 1447.00 | +/-12 | 2,614 | 阿里巴巴 | Apache 2.0 |
| 47 | nvidia-nemotron-3-ultra-550b-a55b-nvfp4Nvidia | 1445.00 | +/-31 | 347 | Nvidia | OpenMDW-1.1 |
| 48 | Opus 4.1 (thinking-16k)Anthropic | 1444.00 | +/-11 | 3,025 | Anthropic | Proprietary |
| 49 | mimo-v2-omniXiaomi | 1443.00 | +/-25 | 598 | Xiaomi | Proprietary |
| 50 | 1443.00 | +/-10 | 3,833 | xAI | Proprietary | |
| 51 | Kimi K2.5 InstantMoonshot AI | 1442.00 | +/-25 | 513 | Moonshot AI | Modified MIT |
| 52 | Gemini 2.5 Pro Experimental 03-25Google Deep Mind | 1442.00 | +/-7 | 7,644 | Google Deep Mind | Proprietary |
| 53 | gemini-3.1-flash-lite-previewGoogle | 1442.00 | +/-11 | 2,855 | Proprietary | |
| 54 | GPT-5.4 mini (high)OpenAI | 1441.00 | +/-13 | 2,233 | OpenAI | Proprietary |
| 55 | GLM-5智谱AI | 1440.00 | +/-15 | 1,406 | 智谱AI | MIT |
| 56 | Qwen3 Max (Preview)阿里巴巴 | 1439.00 | +/-15 | 1,525 | 阿里巴巴 | Proprietary |
| 57 | Kimi K2 Thinking (thinking-turbo)Moonshot AI | 1438.00 | +/-10 | 3,785 | Moonshot AI | Modified MIT |
| 58 | DeepSeek-V4-ProDeepSeek-AI | 1437.00 | +/-15 | 1,651 | DeepSeek-AI | MIT |
| 59 | ERNIE 5.0百度 | 1437.00 | +/-13 | 2,150 | 百度 | Proprietary |
| 60 | DeepSeek-V4-Flash (thinking)DeepSeek-AI | 1436.00 | +/-16 | 1,511 | DeepSeek-AI | MIT |
| 61 | longcat-flash-chat-2602-expMeituan | 1436.00 | +/-14 | 1,753 | Meituan | Proprietary |
| 62 | GPT-5-Pro (high)OpenAI | 1434.00 | +/-14 | 1,887 | OpenAI | Proprietary |
| 63 | GPT-5.2OpenAI | 1433.00 | +/-10 | 3,461 | OpenAI | Proprietary |
| 64 | Opus 4.1Anthropic | 1433.00 | +/-9 | 4,724 | Anthropic | Proprietary |
| 65 | mistral-medium-3.5Mistral | 1433.00 | +/-25 | 519 | Mistral | Modified MIT |
| 66 | GPT-5.4 nano (high)OpenAI | 1432.00 | +/-13 | 2,079 | OpenAI | Proprietary |
| 67 | qwen3-max-2025-09-23Alibaba | 1430.00 | +/-24 | 582 | Alibaba | Proprietary |
| 68 | DeepSeek V3.2DeepSeek-AI | 1430.00 | +/-11 | 3,004 | DeepSeek-AI | MIT |
| 69 | Qwen3.5-27B阿里巴巴 | 1429.00 | +/-15 | 1,653 | 阿里巴巴 | Apache 2.0 |
| 70 | 1429.00 | +/-9 | 4,235 | xAI | Proprietary | |
| 71 | hunyuan-hy3-previewTencent | 1429.00 | +/-28 | 405 | Tencent | tencent-hunyuan-community |
| 72 | GLM-4.7智谱AI | 1428.00 | +/-21 | 710 | 智谱AI | MIT |
| 73 | Claude Sonnet 4.5Anthropic | 1428.00 | +/-9 | 4,913 | Anthropic | Proprietary |
| 74 | 1428.00 | +/-12 | 2,263 | xAI | Proprietary | |
| 75 | DeepSeek V3.2-Exp (thinking)DeepSeek-AI | 1428.00 | +/-26 | 481 | DeepSeek-AI | MIT |
| 76 | amazon-nova-experimental-chat-26-02-10Amazon | 1428.00 | +/-39 | 207 | Amazon | Proprietary |
| 77 | DeepSeek-V4-FlashDeepSeek-AI | 1427.00 | +/-15 | 1,523 | DeepSeek-AI | MIT |
| 78 | DeepSeek V3.2 (thinking)DeepSeek-AI | 1426.00 | +/-12 | 2,506 | DeepSeek-AI | MIT |
| 79 | GPT-5.3OpenAI | 1425.00 | +/-13 | 2,046 | OpenAI | Proprietary |
| 80 | Qwen3.5-122B-A10B阿里巴巴 | 1424.00 | +/-14 | 1,779 | 阿里巴巴 | Apache 2.0 |
| 81 | GPT-5.1 InstantOpenAI | 1424.00 | +/-11 | 2,866 | OpenAI | Proprietary |
| 82 | 1423.00 | +/-29 | 398 | xAI | Proprietary | |
| 83 | GLM-4.6智谱AI | 1421.00 | +/-13 | 2,107 | 智谱AI | MIT |
| 84 | Claude Opus 4 (thinking-16k)Anthropic | 1420.00 | +/-12 | 2,239 | Anthropic | Proprietary |
| 85 | Qwen3-235B-A22B-2507阿里巴巴 | 1420.00 | +/-8 | 5,924 | 阿里巴巴 | Apache 2.0 |
| 86 | Qwen3-Next阿里巴巴 | 1419.00 | +/-17 | 1,211 | 阿里巴巴 | Apache 2.0 |
| 87 | 1418.00 | +/-16 | 1,454 | xAI | Proprietary | |
| 88 | DeepSeek V3.2-ExpDeepSeek-AI | 1418.00 | +/-21 | 775 | DeepSeek-AI | MIT |
| 89 | 1417.00 | +/-10 | 3,500 | xAI | Proprietary | |
| 90 | longcat-flash-chatMeituan | 1417.00 | +/-22 | 689 | Meituan | MIT |
| 91 | Kimi K2 0905Moonshot AI | 1416.00 | +/-21 | 759 | Moonshot AI | Modified MIT |
| 92 | OpenAI o4 - miniOpenAI | 1415.00 | +/-11 | 2,939 | OpenAI | Proprietary |
| 93 | DeepSeek-V3.1DeepSeek-AI | 1415.00 | +/-18 | 992 | DeepSeek-AI | MIT |
| 94 | 1415.00 | +/-14 | 1,953 | MiniMaxAI | Modified MIT | |
| 95 | DeepSeek-V3.1 (thinking)DeepSeek-AI | 1415.00 | +/-22 | 663 | DeepSeek-AI | MIT |
| 96 | GLM-4.5智谱AI | 1413.00 | +/-15 | 1,425 | 智谱AI | MIT |
| 97 | GPT-5OpenAI | 1413.00 | +/-14 | 1,785 | OpenAI | Proprietary |
| 98 | Gemini 2.5 Flash-Preview-09-2025Google Deep Mind | 1412.00 | +/-13 | 1,944 | Google Deep Mind | Proprietary |
| 99 | 1412.00 | +/-18 | 1,084 | xAI | Proprietary | |
| 100 | DeepSeek-R1DeepSeek-AI | 1411.00 | +/-14 | 1,606 | DeepSeek-AI | MIT |
| 101 | Qwen3-VL-235B-A22B-Instruct阿里巴巴 | 1411.00 | +/-23 | 704 | 阿里巴巴 | Apache 2.0 |
| 102 | amazon-nova-experimental-chat-26-01-10Amazon | 1409.00 | +/-33 | 263 | Amazon | Proprietary |
| 103 | GPT-4.5OpenAI | 1409.00 | +/-15 | 1,393 | OpenAI | Proprietary |
| 104 | OpenAI o1OpenAI | 1409.00 | +/-11 | 2,986 | OpenAI | Proprietary |
| 105 | Step 3.5 FlashStepFunAI | 1408.00 | +/-12 | 2,641 | StepFunAI | Apache 2.0 |
| 106 | ERNIE 5.0百度 | 1408.00 | +/-23 | 618 | 百度 | Proprietary |
| 107 | DeepSeek-V3.1 Terminus (thinking)DeepSeek-AI | 1407.00 | +/-41 | 197 | DeepSeek-AI | MIT |
| 108 | Gemini 2.5 FlashGoogle Deep Mind | 1406.00 | +/-7 | 7,879 | Google Deep Mind | Proprietary |
| 109 | OpenAI o3-mini (high)OpenAI | 1406.00 | +/-13 | 1,909 | OpenAI | Proprietary |
| 110 | GPT-5-mini (high)OpenAI | 1405.00 | +/-15 | 1,459 | OpenAI | Proprietary |
| 111 | Qwen3-VL-235B-A22B-Instruct (thinking)阿里巴巴 | 1405.00 | +/-28 | 427 | 阿里巴巴 | Apache 2.0 |
| 112 | GPT-4o(2025-03-27)OpenAI | 1404.00 | +/-8 | 5,721 | OpenAI | Proprietary |
| 113 | Claude Opus 4Anthropic | 1403.00 | +/-11 | 2,768 | Anthropic | Proprietary |
| 114 | Claude Sonnet 4 (thinking-32k)Anthropic | 1403.00 | +/-13 | 2,022 | Anthropic | Proprietary |
| 115 | Step 3.5 FlashStepFunAI | 1403.00 | +/-12 | 2,404 | StepFunAI | Proprietary |
| 116 | Mistral Large 3MistralAI | 1402.00 | +/-11 | 2,809 | MistralAI | Apache 2.0 |
| 117 | Hunyuan-T1腾讯AI实验室 | 1401.00 | +/-38 | 236 | 腾讯AI实验室 | Proprietary |
| 118 | amazon-nova-experimental-chat-12-10Amazon | 1400.00 | +/-37 | 234 | Amazon | Proprietary |
| 119 | Qwen3.5-35B-A3B阿里巴巴 | 1400.00 | +/-14 | 1,764 | 阿里巴巴 | Apache 2.0 |
| 120 | ERNIE 5.0百度 | 1400.00 | +/-34 | 268 | 百度 | Proprietary |
| 121 | Qwen3-32B阿里巴巴 | 1399.00 | +/-30 | 316 | 阿里巴巴 | Apache 2.0 |
| 122 | Magistral-Medium-2506MistralAI | 1399.00 | +/-8 | 5,827 | MistralAI | Proprietary |
| 123 | amazon-nova-experimental-chat-11-10Amazon | 1398.00 | +/-15 | 1,584 | Amazon | Proprietary |
| 124 | qwen3-235b-a22b-thinking-2507Alibaba | 1398.00 | +/-24 | 489 | Alibaba | Apache 2.0 |
| 125 | Haiku 4.5Anthropic | 1398.00 | +/-9 | 5,407 | Anthropic | Proprietary |
| 126 | 1397.00 | +/-12 | 2,436 | MiniMaxAI | Modified MIT | |
| 127 | DeepSeek-R1-0528DeepSeek-AI | 1396.00 | +/-20 | 869 | DeepSeek-AI | MIT |
| 128 | DeepSeek-V3.1 TerminusDeepSeek-AI | 1395.00 | +/-39 | 218 | DeepSeek-AI | MIT |
| 129 | amazon-nova-experimental-chat-10-20Amazon | 1395.00 | +/-20 | 806 | Amazon | Proprietary |
| 130 | qwen3-235b-a22b-no-thinkingAlibaba | 1394.00 | +/-12 | 2,392 | Alibaba | Apache 2.0 |
| 131 | Qwen3-235B-A22B阿里巴巴 | 1393.00 | +/-14 | 1,604 | 阿里巴巴 | Apache 2.0 |
| 132 | 1392.00 | +/-18 | 1,010 | MiniMaxAI | MIT | |
| 133 | GLM-4.5-Air智谱AI | 1390.00 | +/-15 | 1,540 | 智谱AI | MIT |
| 134 | nvidia-llama-3.3-nemotron-super-49b-v1.5Nvidia | 1390.00 | +/-39 | 194 | Nvidia | Nvidia Open |
| 135 | Qwen3-Next (thinking)阿里巴巴 | 1390.00 | +/-20 | 828 | 阿里巴巴 | Apache 2.0 |
| 136 | Kimi K2Moonshot AI | 1389.00 | +/-14 | 1,695 | Moonshot AI | Modified MIT |
| 137 | OpenAI o3-mini (high)OpenAI | 1388.00 | +/-18 | 977 | OpenAI | Proprietary |
| 138 | Claude Sonnet 4Anthropic | 1388.00 | +/-12 | 2,472 | Anthropic | Proprietary |
| 139 | OpenAI o1OpenAI | 1386.00 | +/-10 | 4,569 | OpenAI | Proprietary |
| 140 | Claude Sonnet 3.7 (thinking-32k)Anthropic | 1384.00 | +/-11 | 2,793 | Anthropic | Proprietary |
| 141 | trinity-large-thinkingArcee AI | 1384.00 | +/-15 | 1,617 | Arcee AI | Apache 2.0 |
| 142 | intellect-3Prime Intellect | 1383.00 | +/-31 | 334 | Prime Intellect | MIT |
| 143 | GPT OSS 120BOpenAI | 1382.00 | +/-14 | 1,792 | OpenAI | Apache 2.0 |
| 144 | OpenAI o3-miniOpenAI | 1382.00 | +/-8 | 4,721 | OpenAI | Proprietary |
| 145 | Qwen3-30B-A3B-2507阿里巴巴 | 1381.00 | +/-15 | 1,426 | 阿里巴巴 | Apache 2.0 |
| 146 | llama-3.1-nemotron-ultra-253b-v1Nvidia | 1380.00 | +/-37 | 209 | Nvidia | Nvidia Open Model |
| 147 | mimo-v2-flash (non-thinking)Xiaomi | 1379.00 | +/-11 | 2,844 | Xiaomi | MIT |
| 148 | Qwen3-Coder-480B-A35B阿里巴巴 | 1377.00 | +/-15 | 1,626 | 阿里巴巴 | Apache 2.0 |
| 149 | nvidia-nemotron-3-super-120b-a12bNvidia | 1375.00 | +/-25 | 515 | Nvidia | NVIDIA Open Model |
| 150 | 1374.00 | +/-11 | 2,677 | xAI | Proprietary | |
| 151 | GPT-4.1OpenAI | 1373.00 | +/-10 | 3,226 | OpenAI | Proprietary |
| 152 | mimo-v2-flash (thinking)Xiaomi | 1373.00 | +/-22 | 632 | Xiaomi | MIT |
| 153 | minimax-m1MiniMax | 1372.00 | +/-13 | 1,801 | MiniMax | Apache 2.0 |
| 154 | DeepSeek-V3-0324DeepSeek-AI | 1370.00 | +/-10 | 3,190 | DeepSeek-AI | MIT |
| 155 | 1369.00 | +/-14 | 1,529 | xAI | Proprietary | |
| 156 | GLM-4.7-Flash智谱AI | 1366.00 | +/-21 | 716 | 智谱AI | MIT |
| 157 | Gemini 2.5 Flash-Lite (thinking)Google Deep Mind | 1365.00 | +/-12 | 2,094 | Google Deep Mind | Proprietary |
| 158 | Gemini 2.5 Flash-Lite-Preview-09-2025 (no-thinking)Google Deep Mind | 1364.00 | +/-11 | 2,878 | Google Deep Mind | Proprietary |
| 159 | Qwen2.5-Max阿里巴巴 | 1364.00 | +/-10 | 3,305 | 阿里巴巴 | Proprietary |
| 160 | QwQ-32B阿里巴巴 | 1364.00 | +/-14 | 1,720 | 阿里巴巴 | Apache 2.0 |
| 161 | Step3StepFunAI | 1364.00 | +/-31 | 351 | StepFunAI | Apache 2.0 |
| 162 | Claude Sonnet 3.7Anthropic | 1362.00 | +/-10 | 3,358 | Anthropic | Proprietary |
| 163 | OpenAI o1-miniOpenAI | 1362.00 | +/-8 | 7,499 | OpenAI | Proprietary |
| 164 | trinity-large-previewArcee AI | 1361.00 | +/-14 | 1,891 | Arcee AI | Apache 2.0 |
| 165 | GLM-4.5V智谱AI | 1357.00 | +/-34 | 277 | 智谱AI | MIT |
| 166 | Gemini 2.0 Flash ExperimentalDeepMind | 1356.00 | +/-9 | 4,065 | DeepMind | Proprietary |
| 167 | 1356.00 | +/-33 | 319 | MiniMaxAI | Apache 2.0 | |
| 168 | GPT-4.1 miniOpenAI | 1355.00 | +/-11 | 2,693 | OpenAI | Proprietary |
| 169 | ling-flash-2.0Ant Group | 1354.00 | +/-27 | 460 | Ant Group | MIT |
| 170 | nvidia-nemotron-3-nano-30b-a3b-bf16Nvidia | 1353.00 | +/-19 | 987 | Nvidia | NVIDIA Open Model |
| 171 | Qwen3-30B-A3B阿里巴巴 | 1353.00 | +/-14 | 1,707 | 阿里巴巴 | Apache 2.0 |
| 172 | Claude 3.5 SonnetAnthropic | 1351.00 | +/-7 | 10,017 | Anthropic | Proprietary |
| 173 | mistral-medium-2505Mistral | 1349.00 | +/-12 | 2,229 | Mistral | Proprietary |
| 174 | hunyuan-turbos-20250416Tencent | 1348.00 | +/-20 | 845 | Tencent | Proprietary |
| 175 | GPT-5-Nano (high)OpenAI | 1344.00 | +/-27 | 493 | OpenAI | Proprietary |
| 176 | Claude 3.5 SonnetAnthropic | 1342.00 | +/-7 | 11,359 | Anthropic | Proprietary |
| 177 | ring-flash-2.0Ant Group | 1339.00 | +/-27 | 453 | Ant Group | MIT |
| 178 | Mistral-Small-3.2MistralAI | 1339.00 | +/-18 | 1,042 | MistralAI | Apache 2.0 |
| 179 | Gemini 1.5 ProGoogle Deep Mind | 1339.00 | +/-7 | 7,610 | Google Deep Mind | Proprietary |
| 180 | GPT OSS 20BOpenAI | 1336.00 | +/-22 | 680 | OpenAI | Apache 2.0 |
| 181 | Nova 2 Lite亚马逊 | 1335.00 | +/-20 | 826 | 亚马逊 | Proprietary |
| 182 | Gemini 2.0 Flash-LiteDeepMind | 1326.00 | +/-10 | 2,814 | DeepMind | Proprietary |
| 183 | qwen-plus-0125Alibaba | 1324.00 | +/-19 | 732 | Alibaba | Proprietary |
| 184 | Gemma 3 - 27B (IT)Google Deep Mind | 1322.00 | +/-9 | 3,581 | Google Deep Mind | Gemma |
| 185 | granite-4.1-8bIBM | 1320.00 | +/-39 | 236 | IBM | Apache 2.0 |
| 186 | llama-3.1-405b-instruct-fp8Meta | 1319.00 | +/-8 | 8,482 | Meta | Llama 3.1 Community |
| 187 | Llama 4 Maverick InstructFacebook AI研究实验室 | 1318.00 | +/-11 | 2,838 | Facebook AI研究实验室 | Llama 4 |
| 188 | Gemma 3 - 12B (IT)Google Deep Mind | 1317.00 | +/-27 | 389 | Google Deep Mind | Gemma |
| 189 | llama-3.1-405b-instruct-bf16Meta | 1315.00 | +/-8 | 5,215 | Meta | Llama 3.1 Community |
| 190 | step-2-16k-exp-202412StepFun | 1313.00 | +/-20 | 642 | StepFun | Proprietary |
| 191 | athene-v2-chatNexusFlow | 1312.00 | +/-9 | 3,412 | NexusFlow | NexusFlow |
| 192 | Claude3-OpusAnthropic | 1312.00 | +/-6 | 25,769 | Anthropic | Proprietary |
| 193 | olmo-3-32b-thinkAi2 | 1311.00 | +/-32 | 314 | Ai2 | Apache 2.0 |
| 194 | DeepSeek-V3DeepSeek-AI | 1311.00 | +/-11 | 2,721 | DeepSeek-AI | DeepSeek |
| 195 | C4AI Command A (202503)CohereAI | 1309.00 | +/-9 | 3,994 | CohereAI | CC-BY-NC-4.0 |
| 196 | Llama 4 Scout InstructFacebook AI研究实验室 | 1309.00 | +/-13 | 1,945 | Facebook AI研究实验室 | Llama |
| 197 | GPT-4oOpenAI | 1309.00 | +/-8 | 6,826 | OpenAI | Proprietary |
| 198 | yi-lightning01 AI | 1306.00 | +/-10 | 3,921 | 01 AI | Proprietary |
| 199 | olmo-3.1-32b-instructAi2 | 1306.00 | +/-23 | 696 | Ai2 | Apache 2.0 |
| 200 | gemini-advanced-0514Google | 1305.00 | +/-10 | 6,395 | Proprietary | |
| 201 | GPT-4oOpenAI | 1305.00 | +/-7 | 15,103 | OpenAI | Proprietary |
| 202 | qwen2.5-plus-1127Alibaba | 1304.00 | +/-14 | 1,404 | Alibaba | Proprietary |
| 203 | GPT-4OpenAI | 1303.00 | +/-8 | 13,306 | OpenAI | Proprietary |
| 204 | hunyuan-turbos-20250226Tencent | 1302.00 | +/-31 | 238 | Tencent | Proprietary |
| 205 | GPT-4OpenAI | 1299.00 | +/-8 | 12,374 | OpenAI | Proprietary |
| 206 | step-1o-turbo-202506StepFun | 1299.00 | +/-24 | 565 | StepFun | Proprietary |
| 207 | glm-4-plus-0111Zhipu | 1298.00 | +/-19 | 721 | Zhipu | Proprietary |
| 208 | Gemini 1.5 ProGoogle Deep Mind | 1298.00 | +/-8 | 10,492 | Google Deep Mind | Proprietary |
| 209 | Qwen2.5-VL-72B-Instruct阿里巴巴 | 1297.00 | +/-8 | 5,415 | 阿里巴巴 | Qwen |
| 210 | olmo-3.1-32b-thinkAi2 | 1297.00 | +/-26 | 473 | Ai2 | Apache 2.0 |
| 211 | gpt-4-turbo-2024-04-09OpenAI | 1296.00 | +/-8 | 13,217 | OpenAI | Proprietary |
| 212 | Llama3.3-70B-InstructFacebook AI研究实验室 | 1296.00 | +/-8 | 5,777 | Facebook AI研究实验室 | Llama-3.3 |
| 213 | 1294.00 | +/-7 | 8,950 | xAI | Proprietary | |
| 214 | hunyuan-large-2025-02-10Tencent | 1294.00 | +/-24 | 497 | Tencent | Proprietary |
| 215 | deepseek-v2.5-1210DeepSeek | 1293.00 | +/-17 | 1,031 | DeepSeek | DeepSeek |
| 216 | qwen-max-0919Alibaba | 1292.00 | +/-12 | 2,249 | Alibaba | Qwen |
| 217 | hunyuan-standard-2025-02-10Tencent | 1290.00 | +/-24 | 499 | Tencent | Proprietary |
| 218 | gemini-1.5-flash-002Google | 1288.00 | +/-9 | 4,789 | Proprietary | |
| 219 | mistral-large-2407Mistral | 1288.00 | +/-8 | 6,664 | Mistral | Mistral Research |
| 220 | DeepSeek V2.5DeepSeek-AI | 1288.00 | +/-10 | 3,649 | DeepSeek-AI | DeepSeek |
| 221 | glm-4-plusZhipu AI | 1287.00 | +/-10 | 3,599 | Zhipu AI | Proprietary |
| 222 | Claude 3.5 HaikuAnthropic | 1286.00 | +/-7 | 6,365 | Anthropic | Proprietary |
| 223 | Magistral-Medium-2506MistralAI | 1286.00 | +/-26 | 554 | MistralAI | Proprietary |
| 224 | GPT-4OpenAI | 1283.00 | +/-10 | 7,052 | OpenAI | Proprietary |
| 225 | mistral-large-2411Mistral | 1282.00 | +/-9 | 3,574 | Mistral | MRL |
| 226 | hunyuan-large-visionTencent | 1280.00 | +/-30 | 351 | Tencent | Proprietary |
| 227 | hunyuan-turbo-0110Tencent | 1279.00 | +/-31 | 243 | Tencent | Proprietary |
| 228 | ibm-granite-h-smallIBM | 1279.00 | +/-32 | 358 | IBM | Apache 2.0 |
| 229 | Llama3.1-70B-InstructFacebook AI研究实验室 | 1279.00 | +/-17 | 1,041 | Facebook AI研究实验室 | Llama 3.1 |
| 230 | Mistral-Small-3.1-24B-Instruct-2503MistralAI | 1278.00 | +/-13 | 2,129 | MistralAI | Apache 2.0 |
| 231 | GPT-4o miniOpenAI | 1276.00 | +/-7 | 9,322 | OpenAI | Proprietary |
| 232 | GPT-4OpenAI | 1275.00 | +/-8 | 11,181 | OpenAI | Proprietary |
| 233 | GPT-4.1 nanoOpenAI | 1274.00 | +/-23 | 582 | OpenAI | Proprietary |
| 234 | Qwen2-72B-Instruct阿里巴巴 | 1273.00 | +/-9 | 4,835 | 阿里巴巴 | Qianwen LICENSE |
| 235 | 1273.00 | +/-8 | 7,261 | xAI | Proprietary | |
| 236 | deepseek-coder-v2DeepSeek | 1271.00 | +/-13 | 1,858 | DeepSeek | DeepSeek License |
| 237 | llama-3.1-nemotron-51b-instructNvidia | 1271.00 | +/-22 | 507 | Nvidia | Llama 3.1 |
| 238 | Qwen2.5-Coder-32B-Instruct阿里巴巴 | 1270.00 | +/-19 | 725 | 阿里巴巴 | Apache 2.0 |
| 239 | amazon-nova-pro-v1.0Amazon | 1269.00 | +/-10 | 2,978 | Amazon | Proprietary |
| 240 | Llama3.1-70B-InstructFacebook AI研究实验室 | 1269.00 | +/-8 | 7,677 | Facebook AI研究实验室 | Llama 3.1 Community |
| 241 | Phi 4 - 14BMicrosoft Azure | 1265.00 | +/-10 | 2,764 | Microsoft Azure | MIT |
| 242 | llama-3.1-tulu-3-70bAi2 | 1264.00 | +/-25 | 397 | Ai2 | Llama 3.1 |
| 243 | Mistral Small 24B Instruct 2501MistralAI | 1262.00 | +/-13 | 1,683 | MistralAI | Apache 2.0 |
| 244 | athene-70b-0725NexusFlow | 1261.00 | +/-10 | 2,921 | NexusFlow | CC-BY-NC-4.0 |
| 245 | Gemma-3n-E4BGoogle Deep Mind | 1260.00 | +/-15 | 1,572 | Google Deep Mind | Gemma |
| 246 | Llama3-70B-InstructFacebook AI研究实验室 | 1257.00 | +/-7 | 20,941 | Facebook AI研究实验室 | Llama 3 Community |
| 247 | gemini-1.5-flash-001Google | 1257.00 | +/-8 | 8,392 | Proprietary | |
| 248 | Gemma 3 - 4B (IT)Google Deep Mind | 1254.00 | +/-28 | 423 | Google Deep Mind | Gemma |
| 249 | Claude3-SonnetAnthropic | 1253.00 | +/-8 | 13,766 | Anthropic | Proprietary |
| 250 | nemotron-4-340b-instructNvidia | 1252.00 | +/-12 | 2,352 | Nvidia | NVIDIA Open Model |
| 251 | hunyuan-standard-256kTencent | 1250.00 | +/-29 | 361 | Tencent | Proprietary |
| 252 | GLM4智谱AI | 1247.00 | +/-16 | 1,191 | 智谱AI | Proprietary |
| 253 | reka-core-20240904Reka AI | 1246.00 | +/-14 | 1,207 | Reka AI | Proprietary |
| 254 | gemma-2-27b-itGoogle | 1246.00 | +/-7 | 10,170 | Gemma license | |
| 255 | jamba-1.5-largeAI21 Labs | 1245.00 | +/-15 | 1,147 | AI21 Labs | Jamba Open |
| 256 | amazon-nova-lite-v1.0Amazon | 1244.00 | +/-11 | 2,511 | Amazon | Proprietary |
| 257 | mistral-large-2402Mistral | 1244.00 | +/-9 | 7,987 | Mistral | Proprietary |
| 258 | C4AI Aya Vision 32BCohereAI | 1232.00 | +/-10 | 3,854 | CohereAI | CC-BY-NC-4.0 |
| 259 | reka-flash-20240904Reka AI | 1232.00 | +/-14 | 1,284 | Reka AI | Proprietary |
| 260 | Claude3-HaikuAnthropic | 1231.00 | +/-7 | 14,983 | Anthropic | Proprietary |
| 261 | command-r-plus-08-2024Cohere | 1231.00 | +/-14 | 1,467 | Cohere | CC-BY-NC-4.0 |
| 262 | gemini-1.5-flash-8b-001Google | 1229.00 | +/-8 | 5,036 | Proprietary | |
| 263 | Mixtral-8x22B-Instruct-v0.1MistralAI | 1228.00 | +/-9 | 6,778 | MistralAI | Apache 2.0 |
| 264 | olmo-2-0325-32b-instructAi2 | 1227.00 | +/-28 | 375 | Ai2 | Apache-2.0 |
| 265 | amazon-nova-micro-v1.0Amazon | 1224.00 | +/-11 | 2,455 | Amazon | Proprietary |
| 266 | Qwen1.5-110B-Chat阿里巴巴 | 1221.00 | +/-11 | 3,188 | 阿里巴巴 | Qianwen LICENSE |
| 267 | mistral-mediumMistral | 1220.00 | +/-11 | 4,406 | Mistral | Proprietary |
| 268 | gemma-2-9b-itGoogle | 1218.00 | +/-8 | 7,110 | Gemma license | |
| 269 | Phi-3-medium 14B-previewMicrosoft Azure | 1215.00 | +/-11 | 3,238 | Microsoft Azure | MIT |
| 270 | ministral-8b-2410Mistral | 1214.00 | +/-20 | 683 | Mistral | MRL |
| 271 | C4AI Command R+CohereAI | 1213.00 | +/-8 | 9,769 | CohereAI | CC-BY-NC-4.0 |
| 272 | Yi-1.5-34B零一万物 | 1213.00 | +/-11 | 2,985 | 零一万物 | Apache-2.0 |
| 273 | QwQ-32B-Preview阿里巴巴 | 1212.00 | +/-24 | 480 | 阿里巴巴 | Apache 2.0 |
| 274 | reka-flash-21b-20240226-onlineReka AI | 1211.00 | +/-14 | 2,028 | Reka AI | Proprietary |
| 275 | Qwen1.5-72B-Chat阿里巴巴 | 1208.00 | +/-10 | 5,327 | 阿里巴巴 | Qianwen LICENSE |
| 276 | InternLM2-Base-20B上海人工智能实验室 | 1207.00 | +/-15 | 1,387 | 上海人工智能实验室 | Other |
| 277 | llama-3.1-tulu-3-8bAi2 | 1206.00 | +/-26 | 363 | Ai2 | Llama 3.1 |
| 278 | command-r-08-2024Cohere | 1206.00 | +/-14 | 1,601 | Cohere | CC-BY-NC-4.0 |
| 279 | gemma-2-9b-it-simpoPrinceton | 1205.00 | +/-15 | 1,285 | Princeton | MIT |
| 280 | gpt-3.5-turbo-1106OpenAI | 1203.00 | +/-15 | 2,134 | OpenAI | Proprietary |
| 281 | qwen1.5-32b-chatAlibaba | 1200.00 | +/-12 | 2,649 | Alibaba | Qianwen LICENSE |
| 282 | C4AI Aya Vision 8BCohereAI | 1200.00 | +/-15 | 1,307 | CohereAI | CC-BY-NC-4.0 |
| 283 | gpt-3.5-turbo-0125OpenAI | 1200.00 | +/-8 | 8,626 | OpenAI | Proprietary |
| 284 | Gemini-proDeepMind | 1199.00 | +/-19 | 993 | DeepMind | Proprietary |
| 285 | reka-flash-21b-20240226Reka AI | 1199.00 | +/-11 | 3,363 | Reka AI | Proprietary |
| 286 | granite-3.1-2b-instructIBM | 1197.00 | +/-26 | 391 | IBM | Apache 2.0 |
| 287 | granite-3.0-8b-instructIBM | 1197.00 | +/-19 | 873 | IBM | Apache 2.0 |
| 288 | zephyr-orpo-141b-A35b-v0.1HuggingFace | 1196.00 | +/-22 | 589 | HuggingFace | Apache 2.0 |
| 289 | gemini-pro-dev-apiGoogle | 1196.00 | +/-14 | 2,274 | Proprietary | |
| 290 | DBRX Instructdatabricks | 1196.00 | +/-11 | 4,001 | databricks | DBRX LICENSE |
| 291 | Phi-3-mini 3.8BMicrosoft Azure | 1193.00 | +/-14 | 1,568 | Microsoft Azure | MIT |
| 292 | Phi-3-small 7BMicrosoft Azure | 1193.00 | +/-13 | 2,092 | Microsoft Azure | MIT |
| 293 | Llama3-8B-InstructFacebook AI研究实验室 | 1192.00 | +/-8 | 14,252 | Facebook AI研究实验室 | Llama 3 Community |
| 294 | mixtral-8x7b-instruct-v0.1Mistral | 1191.00 | +/-9 | 9,663 | Mistral | Apache 2.0 |
| 295 | Llama3.1-8B-InstructFacebook AI研究实验室 | 1190.00 | +/-28 | 382 | Facebook AI研究实验室 | Apache 2.0 |
| 296 | Llama3.1-8B-InstructFacebook AI研究实验室 | 1189.00 | +/-8 | 7,135 | Facebook AI研究实验室 | Llama 3.1 Community |
| 297 | jamba-1.5-miniAI21 Labs | 1186.00 | +/-16 | 1,094 | AI21 Labs | Jamba Open |
| 298 | command-rCohere | 1176.00 | +/-9 | 6,682 | Cohere | CC-BY-NC-4.0 |
| 299 | Qwen3-VL-2B阿里巴巴 | 1168.00 | +/-19 | 908 | 阿里巴巴 | Apache 2.0 |
| 300 | Qwen1.5-14B-Chat阿里巴巴 | 1167.00 | +/-14 | 2,184 | 阿里巴巴 | Qianwen LICENSE |
| 301 | llama-3.2-3b-instructMeta | 1165.00 | +/-16 | 1,136 | Meta | Llama 3.2 |
| 302 | gemma-2-2b-itGoogle | 1163.00 | +/-8 | 6,599 | Gemma license | |
| 303 | snowflake-arctic-instructSnowflake | 1162.00 | +/-11 | 4,793 | Snowflake | Apache 2.0 |
| 304 | Gemma 1.1-7B-ITGoogle Research | 1160.00 | +/-11 | 3,039 | Google Research | Gemma license |
| 305 | openchat-3.5-0106OpenChat | 1158.00 | +/-14 | 1,726 | OpenChat | Apache-2.0 |
| 306 | starling-lm-7b-betaNexusflow | 1158.00 | +/-14 | 1,973 | Nexusflow | Apache-2.0 |
| 307 | WizardLM-70B-V1.0WizardLM Team | 1157.00 | +/-19 | 903 | WizardLM Team | Llama 2 Community |
| 308 | DeepSeek LLM 67B ChatDeepSeek-AI | 1155.00 | +/-23 | 576 | DeepSeek-AI | DeepSeek License |
| 309 | smollm2-1.7b-instructHuggingFace | 1152.00 | +/-33 | 271 | HuggingFace | Apache 2.0 |
| 310 | openhermes-2.5-mistral-7bNousResearch | 1151.00 | +/-20 | 697 | NousResearch | Apache-2.0 |
| 311 | Yi-34B零一万物 | 1151.00 | +/-13 | 2,043 | 零一万物 | Yi License |
| 312 | Phi-3-mini 3.8BMicrosoft Azure | 1150.00 | +/-12 | 2,564 | Microsoft Azure | MIT |
| 313 | tulu-2-dpo-70bAllenAI/UW | 1145.00 | +/-19 | 888 | AllenAI/UW | AI2 ImpACT Low-risk |
| 314 | Phi-3-mini 3.8BMicrosoft Azure | 1139.00 | +/-13 | 2,813 | Microsoft Azure | MIT |
| 315 | llama-2-70b-chatMeta | 1136.00 | +/-10 | 4,740 | Meta | Llama 2 Community |
| 316 | Mistral-7B-Instruct-v0.2MistralAI | 1127.00 | +/-12 | 2,605 | MistralAI | Apache-2.0 |
| 317 | starling-lm-7b-alphaUC Berkeley | 1126.00 | +/-16 | 1,300 | UC Berkeley | CC-BY-NC-4.0 |
| 318 | Qwen-14B-Chat阿里巴巴 | 1125.00 | +/-24 | 534 | 阿里巴巴 | Qianwen LICENSE |
| 319 | dolphin-2.2.1-mistral-7bCognitive Computations | 1125.00 | +/-32 | 219 | Cognitive Computations | Apache-2.0 |
| 320 | openchat-3.5OpenChat | 1125.00 | +/-18 | 945 | OpenChat | Apache-2.0 |
| 321 | llama-3.2-1b-instructMeta | 1124.00 | +/-16 | 1,162 | Meta | Llama 3.2 |
| 322 | Qwen1.5-7B-Chat阿里巴巴 | 1120.00 | +/-20 | 690 | 阿里巴巴 | Qianwen LICENSE |
| 323 | Gemma 7B - ItGoogle Research | 1118.00 | +/-16 | 1,120 | Google Research | Gemma license |
| 324 | Vicuna 33BLM-SYS | 1115.00 | +/-13 | 2,663 | LM-SYS | Non-commercial |
| 325 | PaLM 2Google Research | 1115.00 | +/-19 | 901 | Google Research | Proprietary |
| 326 | llama2-70b-steerlm-chatNvidia | 1114.00 | +/-27 | 440 | Nvidia | Llama 2 Community |
| 327 | Baichuan2-13B-Chat百川智能 | 1110.00 | +/-13 | 2,218 | 百川智能 | Llama 2 Community |
| 328 | CodeLLaMA-34BFacebook AI研究实验室 | 1109.00 | +/-19 | 770 | Facebook AI研究实验室 | Llama 2 Community |
| 329 | solar-10.7b-instruct-v1.0Upstage AI | 1109.00 | +/-22 | 604 | Upstage AI | CC-BY-NC-4.0 |
| 330 | Gemma 1.1-2B-ITGoogle Research | 1108.00 | +/-16 | 1,355 | Google Research | Gemma license |
| 331 | MPT-30B-ChatMosaicML | 1095.00 | +/-34 | 242 | MosaicML | CC-BY-NC-SA-4.0 |
| 332 | nous-hermes-2-mixtral-8x7b-dpoNousResearch | 1093.00 | +/-21 | 628 | NousResearch | Apache-2.0 |
| 333 | Baichuan2-7B-Chat百川智能 | 1086.00 | +/-14 | 1,656 | 百川智能 | Llama 2 Community |
| 334 | Qwen1.5-4B-Chat阿里巴巴 | 1086.00 | +/-18 | 988 | 阿里巴巴 | Qianwen LICENSE |
| 335 | stripedhyena-nous-7bTogether AI | 1084.00 | +/-20 | 676 | Together AI | Apache 2.0 |
| 336 | Vicuna 13BLM-SYS | 1083.00 | +/-14 | 2,146 | LM-SYS | Llama 2 Community |
| 337 | zephyr-7b-betaHuggingFace | 1082.00 | +/-17 | 1,250 | HuggingFace | MIT |
| 338 | Mistral 7B InstructMistralAI | 1082.00 | +/-19 | 974 | MistralAI | Apache 2.0 |
| 339 | guanaco-33bUW | 1080.00 | +/-32 | 280 | UW | Non-commercial |
| 340 | Gemma 2B - ItGoogle Research | 1070.00 | +/-22 | 597 | Google Research | Gemma license |
| 341 | wizardlm-13bMicrosoft | 1064.00 | +/-21 | 669 | Microsoft | Llama 2 Community |
| 342 | olmo-7b-instructAi2 | 1054.00 | +/-19 | 848 | Ai2 | Apache-2.0 |
| 343 | Vicuna 7BLM-SYS | 1047.00 | +/-22 | 658 | LM-SYS | Llama 2 Community |
| 344 | ChatGLM3-6B智谱AI | 1042.00 | +/-23 | 576 | 智谱AI | Apache-2.0 |
| 345 | GPT4All 13BNomic AI | 998.00 | +/-37 | 211 | Nomic AI | Non-commercial |
| 346 | alpaca-13bStanford | 992.00 | +/-23 | 652 | Stanford | Non-commercial |
| 347 | MPT-7B-ChatMosaicML | 985.00 | +/-25 | 471 | MosaicML | CC-BY-NC-SA-4.0 |
| 348 | RWKV-4-Raven-14BRWKV | 983.00 | +/-24 | 544 | RWKV | Apache 2.0 |
| 349 | Koala达摩院 | 980.00 | +/-21 | 751 | 达摩院 | Non-commercial |
| 350 | ChatGLM-6B智谱AI | 976.00 | +/-26 | 525 | 智谱AI | Non-commercial |
| 351 | ChatGLM2-6B智谱AI | 971.00 | +/-35 | 227 | 智谱AI | Apache-2.0 |
| 352 | oasst-pythia-12bOpenAssistant | 960.00 | +/-22 | 687 | OpenAssistant | Apache 2.0 |
| 353 | dolly-v2-12bDatabricks | 950.00 | +/-29 | 370 | Databricks | MIT |
| 354 | fastchat-t5-3bLMSYS | 919.00 | +/-26 | 462 | LMSYS | Apache 2.0 |
| 355 | LLaMA 13BFacebook AI研究实验室 | 919.00 | +/-33 | 252 | Facebook AI研究实验室 | Non-commercial |
| 356 | stablelm-tuned-alpha-7bStability AI | 890.00 | +/-29 | 353 | Stability AI | CC-BY-NC-SA-4.0 |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.
FAQ
What is LMArena Math Arena?
LMArena Math Arena is an anonymous evaluation track focused on mathematical reasoning. Users submit real math questions, compare hidden model solutions side by side, and vote for the better answer; the leaderboard is then calculated with Elo-style scoring.
How is Math Arena different from MATH-500 or AIME?
Static benchmarks such as MATH-500 and AIME use fixed problem sets and automated grading. Math Arena uses open-ended user questions and human preference voting, making it a useful complement for measuring how models handle varied real-world math tasks.
Do thinking models perform better in Math Arena?
Models with extended reasoning or chain-of-thought style capabilities often rank higher on math tasks because they spend more time decomposing and checking solutions. That benefit can come with higher latency and cost.
How do China-developed models perform in math?
DeepSeek, Qwen, GLM, and related models have become competitive in math reasoning leaderboards. Open licenses and Chinese-language support can make them especially useful for local deployment and education scenarios.















