DataLearner 标志DataLearnerAI
最新AI资讯
大模型评测
大模型列表
大模型对比
资源中心
Tools

加载中...

DataLearner 标志DataLearner AI

专注大模型评测、数据资源与实践教学的知识平台,持续更新可落地的 AI 能力图谱。

产品

  • 评测榜单
  • 模型对比
  • 数据资源

资源

  • 部署教程
  • 原创内容
  • 工具导航

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner 持续整合行业数据与案例,为科研、企业与开发者提供可靠的大模型情报与实践指南。

隐私政策服务条款

大模型评测基准与性能对比

对比大模型在 MMLU Pro、HLE、SWE-Bench 等评测上的表现,选择评测查看排名。

各个评测基准的详细介绍可见:LLM 评测基准列表与介绍

数据更新于: 2025/11/08 22:10:24

评测切换

在这里切换评测,图表和表格会同步更新

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

还有更多评测基准

进入评测基准列表,按类别/语言快速筛选

更多评测

筛选

全部3B及以下7B13B34B65B100B及以上
全部推理大模型基座大模型指令优化/聊天优化大模型编程大模型

大模型性能评测结果

数据来源:DataLearnerAI
排名模型MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench参数(亿)开源情况
1OpenAI o191.0477.3048.9096.4079.2071.00—不开源
2Gemini 3.0 Pro (Preview 11-2025)90.0091.9076.200.000.0092.00—不开源
3Claude Opus 4.590.0087.0080.900.000.000.00—不开源
4Claude Opus 4.188.0081.0074.500.000.000.00—不开源
5M2.188.0081.0074.800.000.000.002300B免费商用
6Claude Sonnet 4.588.0083.400.000.000.0071.00—不开源
7Qwen3.5-397B-A17B87.8088.4076.400.000.000.00397B免费商用
8Qwen3.5-397B-A17B87.8088.400.000.000.0083.60397B免费商用
9Hunyuan-T187.2069.300.0096.2078.2064.90—不开源
10Grok 487.0087.0058.600.000.0082.00—不开源
11GPT-4.586.1071.4038.0090.7036.7046.40—不开源
12Qwen3.5-27B86.1085.5072.400.000.000.00270B免费商用
13Gemini 2.5-Pro86.000.000.0098.8092.0077.10—不开源
14Qwen3-Max-Thinking85.7087.4075.300.000.0085.9010000B不开源
15OpenAI o385.600.000.0098.1091.6075.80—不开源
16DeepSeek-R1-052885.0081.0057.6098.0091.4073.306710B免费商用
17Grok 4.1 Fast85.0085.000.000.000.0082.00—不开源
18DeepSeek V3.2-Exp85.0079.900.000.000.0074.106710B免费商用
19DeepSeek-V3.1 Terminus85.0080.7068.400.000.0074.906710B免费商用
20DeepSeek-V3.1 Terminus85.0079.000.000.000.0080.006710B免费商用
21DeepSeek-V3.185.0080.100.000.0093.1074.806710B免费商用
22Claude Opus 485.0079.6072.5098.2076.0056.60—不开源
23GLM-4.584.6079.1064.2098.2091.0072.903550B免费商用
24Kimi K2 Thinking84.6084.500.000.000.0083.1010400B免费商用
25Qwen3-235B-A22B-Thinking84.4081.100.000.000.0074.10305B免费商用
26Qwen3-235B-A22B-Thinking-250784.4081.100.000.000.0074.102350B免费商用
27GLM-4.784.3085.700.000.000.0084.903580B免费商用
28DeepSeek-R184.0071.5049.2097.3079.8065.906710B免费商用
29Claude Sonnet 484.0075.400.000.000.0066.00—不开源
30Qwen3 Max (Preview)84.0076.0069.600.000.0057.50—不开源
31DeepSeek V3.2-Exp84.0074.000.000.000.0055.006710B免费商用
32DeepSeek-V3.183.7074.9066.000.0066.3056.406710B免费商用
33Intern-S183.5077.300.000.000.000.002410B免费商用
34Qwen3-235B-A22B-250783.0077.500.000.000.0051.802350B免费商用
35GLM-4.683.0081.000.000.000.0082.803550B免费商用
36Pangu Pro MoE82.6073.700.0096.8079.2059.60719B免费商用
37Llama 4 Behemoth Instruct82.2073.700.0095.000.0049.4020000B免费商用
38MiniMax M282.0078.000.000.000.0083.002300B免费商用
39GLM-4.5-Air81.4075.0057.6098.1089.4070.701060B免费商用
40DeepSeek-V3-032481.2068.4038.8094.0059.4049.206710B免费商用
41MiniMax-M1-80k81.1070.0056.0096.8086.0065.004560B免费商用
42Kimi K281.1075.1051.8097.4069.6053.7010000B免费商用
43OpenAI o4 - mini80.6081.4068.100.0093.400.00—不开源
44MiniMax-M1-40k80.6069.2055.6096.0083.3062.304560B免费商用
45GPT-4.180.5066.3054.6092.8048.1040.50—不开源
46Llama 4 Maverick Instruct80.5069.800.000.000.0043.404000B免费商用
47OpenAI o1-mini80.3060.000.0090.0063.6052.00—不开源
48Haiku 4.580.0073.300.000.000.0062.00—不开源
49GPT-4o(2025-03-27)79.8066.900.000.000.0035.80—不开源
50Gemini 2.0 Pro Experimental79.1064.700.000.0036.000.00—不开源
51Hunyuan-TurboS79.0057.500.000.000.0032.00—不开源
52Pangu Embedded79.000.000.0092.4081.9067.1070B免费商用
53GPT OSS 120B79.0080.1060.100.000.000.00117B免费商用
54Kimi K2.578.5087.6076.800.000.0085.0010000B免费商用
55ERNIE-4.5-300B-A47B78.400.000.0096.4054.8038.803000B免费商用
56Qwen3-30B-A3B-250778.4070.400.000.000.0043.20305B免费商用
57Claude 3.5 Sonnet New78.0065.0049.0078.0016.0038.70—不开源
58GLM-4.678.0063.0068.000.000.0056.003550B免费商用
59GPT-5-mini78.0069.000.000.000.0055.00—不开源
60GPT-4o77.9070.1031.0075.909.3035.10—不开源
61GPT-4o(2024-11-20)77.900.000.000.000.000.00—不开源
62Claude 3.5 Sonnet77.6459.400.000.000.000.00—不开源
63Gemini 2.0 Flash Experimental76.2465.2021.400.000.0029.10—不开源
64Gemini 1.5 Pro76.1053.500.000.000.000.00—不开源
65Qwen2.5-Max76.100.000.000.000.000.00—不开源
66Haiku 4.576.0060.500.000.000.0051.00—不开源
67QwQ-32B76.0058.000.0091.0079.500.00325B免费商用
68DeepSeek-V375.9059.100.0087.8039.0034.606810B免费商用
69Grok 275.5056.000.000.000.000.002690B免费商用
70Llama 4 Scout Instruct74.3057.200.000.000.0032.801090B免费商用
71GPT OSS 20B74.0071.5034.000.000.000.00210B免费商用
72Llama3.1-405B Instruct73.4049.000.000.000.0030.204050B免费商用
73Qwen3-235B-A22B72.9071.1034.4096.2085.7070.702350B免费商用
74Qwen3-8B72.5039.300.0087.4079.4061.8080B免费商用
75GLM-4-9B-Chat72.400.000.000.0076.4051.8090B免费商用
76Gemini 2.0 Flash-Lite71.6051.500.000.000.0028.90—不开源
77QwQ-32B-Preview70.970.000.0090.6050.000.00320B免费商用
78Phi 4 - 14B70.400.000.000.000.000.00140B不可商用
79Qwen2.5-32B69.230.000.000.000.0051.20320B免费商用
80Qwen3-30B-A3B69.1054.800.000.000.0029.00305B免费商用
81Mistral-Small-3.269.0646.130.000.000.000.00240B免费商用
82Llama3.3-70B-Instruct68.9050.500.000.000.0033.30700B免费商用
83Claude3-Opus68.4550.400.000.000.000.00—不开源
84Gemma 3 - 27B (IT)67.5042.400.000.0025.3029.70270B免费商用
85Hunyuan-A13B-Instruct67.2371.200.000.0087.3063.90800B免费商用
86Mistral-Small-3.1-24B-Instruct-250366.7645.960.000.000.000.00240B免费商用
87Llama3.1-70B-Instruct66.4048.000.000.000.0033.30700B免费商用
88Qwen3-Next66.050.000.000.000.0056.60800B免费商用
89Claude 3.5 Haiku65.0041.600.000.000.000.00—不开源
90Qwen2.5-14B63.690.000.000.000.000.00140B免费商用
91Llama 4 Maverick62.900.000.000.000.000.004000B免费商用
92GPT-4o mini61.7041.100.000.000.000.00—不开源
93Llama3.1-405B61.600.000.000.000.000.004050B免费商用
94Gemma 3 - 12B (IT)60.6040.900.000.000.0024.60120B免费商用
95Llama 4 Scout58.200.000.000.000.000.001090B免费商用
96Qwen2.5-72B58.1045.900.000.000.000.00727B免费商用
97Claude3-Sonnet56.800.000.000.000.000.00—不开源
98Gemma2-27B56.540.000.000.000.000.00270B免费商用
99Mixtral-8x22B-Instruct-v0.156.330.000.000.000.000.001410B免费商用
100Llama3-70B-Instruct56.200.000.000.000.000.00700B免费商用
101Phi-4-mini-instruct (3.8B)52.8036.000.0071.8010.000.0038B免费商用
102Llama3-70B52.780.000.000.000.000.00700B免费商用
103Llama3.1-70B52.470.000.000.000.000.00700B免费商用
104Grok-1.551.0035.900.000.000.000.00—不开源
105C4AI Aya Vision 32B47.1633.840.000.000.000.00320B不可商用
106Qwen2.5-7B45.0036.400.000.000.000.0070B免费商用
107Gemma 2 - 9B44.7032.800.000.000.000.0090B免费商用
108Llama3.1-8B-Instruct44.0026.300.000.000.000.0080B免费商用
109Moonlight-16B-A3B-Instruct42.400.000.000.000.000.00160B免费商用
110Llama3.1-8B35.4025.800.000.000.000.0080B免费商用
111Qwen2.5-3B34.6024.300.000.000.000.0030B免费商用
112Mistral-7B-Instruct-v0.330.9024.700.000.000.000.0070B免费商用
113Llama-3.2-3B25.0026.600.000.000.000.0032B免费商用
114GPT-5.1-Codex-Max0.000.0076.800.000.000.00—不开源
115Qwen3.5-397B-A17B0.000.0076.400.000.000.00397B免费商用
116GPT-5.10.000.0076.300.000.000.00—不开源
117o3-pro0.000.0075.000.000.000.00—不开源
118GPT-5 Codex0.000.0074.500.000.000.00—不开源
119Step 3.5 Flash0.000.0074.400.000.0086.401960B免费商用
120GLM-4.70.000.0073.800.000.000.003580B免费商用
121Grok 4 Heavy0.000.0073.500.000.000.00—不开源
122Haiku 4.50.000.0073.300.000.000.00—不开源
123DeepSeek V3.20.000.0073.100.000.000.006710B免费商用
124Claude Sonnet 40.000.0072.700.000.000.00—不开源
125Grok 4 Code0.000.0072.000.000.000.00—不开源
126Kimi K2 Thinking0.000.0071.300.000.000.0010400B免费商用
127Hunyuan-7B0.0060.100.0093.7081.1057.0070B免费商用
128Grok Code Fast 10.000.0070.800.000.000.00—不开源
129Claude Sonnet 4.50.000.0077.200.000.000.00—不开源
130Claude Opus 4.10.000.0079.400.000.000.00—不开源
131GPT-5.20.000.0080.000.000.000.00—不开源
132MiniMax M2.50.000.0080.200.000.000.002290B免费商用
133Claude Sonnet 40.000.0080.200.000.000.00—不开源
134Gemini 3.1 Pro Preview0.000.0080.600.000.002887.00—不开源
135Claude Opus 4.60.000.0080.840.000.000.00—不开源
136Claude Sonnet 50.000.0082.000.000.000.00—不开源
137Claude Sonnet 4.50.000.0082.000.000.000.00—不开源
138GPT-5-mini0.000.000.000.000.000.00—不开源
139Grok 3.50.000.000.000.000.000.00—不开源
140Phi-4-instruct (reasoning-trained)0.0049.000.0090.4050.000.0038B不开源
141DeepSeek-R1-Distill-Qwen-7B0.0049.500.0091.4053.300.0070B免费商用
142GPT-4.1 nano0.0050.300.000.0029.400.00—不开源
143Qwen3-32B0.0053.300.000.0081.4065.70320B免费商用
144Qwen3-30B-A3B-25070.000.0022.000.000.000.00305B免费商用
145Codestral0.000.000.000.000.0031.50220B不可商用
146Codestral 25.010.000.000.000.000.0037.90—不开源
147QwQ-Max-Preview0.000.000.000.000.0065.60—免费商用
148Kimi-k1.6-IOI0.000.000.000.000.0065.90—不开源
149OpenAI o3-mini (medium)0.000.000.000.000.0067.40—不开源
150Kimi-k1.6-IOI-high0.000.000.000.000.0073.80—不开源
151Gemini 2.5 Pro Deep Think0.000.000.000.000.0080.40—不开源
152Qwen3.5-397B-A17B0.000.000.000.000.0083.60397B免费商用
153Claude Opus 4.50.000.000.000.000.0087.00—不开源
154Gemini 2.5 Deep Think0.000.000.000.000.0087.60—不开源
155GPT OSS 20B0.000.000.000.0096.000.00210B免费商用
156GPT OSS 120B0.000.000.000.0096.600.00117B免费商用
157OpenAI o4 - mini0.000.000.000.0098.700.00—不开源
158Kimi k1.5 (Short-CoT)0.000.000.0094.600.000.00—不开源
159Kimi k1.5 (Long-CoT)0.000.000.0096.200.000.00—不开源
160Qwen3-Coder-Next0.000.0070.600.000.000.0080B免费商用
161Devstral Small 1.00.000.0046.800.000.000.00240B免费商用
162Qwen3-Coder-Flash0.000.0051.600.000.000.00305B免费商用
163Devstral Small 1.10.000.0053.600.000.000.00240B免费商用
164Gemini 2.5 Flash-Preview-09-20250.000.0054.000.000.000.00—不开源
165Haiku 4.50.000.0060.600.000.000.00—不开源
166Devstral Medium0.000.0061.600.000.000.00—不开源
167Claude Sonnet 3.70.000.0062.300.000.000.00—不开源
168Qwen3-Coder-480B-A35B0.000.0067.000.000.000.004800B免费商用
169DeepSeek V3.2-Exp0.000.0067.800.000.000.006710B免费商用
170Kimi K2 09050.000.0069.200.000.000.0010000B免费商用
171Kimi K2 09050.000.0069.200.000.000.0010000B免费商用
172MiniMax M20.000.0069.400.000.000.002300B免费商用
173Claude Sonnet 3.70.000.0070.300.000.000.00—不开源
174GPT-5.1 Codex0.000.0070.400.000.0085.50—不开源
175GPT-5-Pro0.0088.400.000.000.000.00—不开源
176o3-pro0.0084.000.000.0093.000.00—不开源
177Gemini 2.5 Pro Experimental 03-250.0084.0063.800.0092.0070.40—不开源
178Grok-3 mini - Reasoning0.0084.000.000.0096.000.00—不开源
179Grok-3 - Reasoning Beta0.0084.600.000.0093.3079.40—不开源
180Claude Sonnet 3.7-64K Extended Thinking0.0084.800.0096.2080.000.00—不开源
181MiniMax M2.50.0085.200.000.000.000.002290B免费商用
182Grok 4 Fast0.0085.700.000.000.0080.00—不开源
183GPT-50.0085.7072.800.000.000.00—不开源
184GLM-50.0086.0077.800.000.000.007440B免费商用
185Gemini 2.5-Pro0.0086.4067.200.000.000.00—不开源
186GPT-50.0087.300.000.000.000.00—不开源
187GPT-5.4 mini0.0088.000.000.000.000.00—不开源
188GPT-5.10.0088.100.000.000.000.00—不开源
189GPT-5.10.0088.1076.300.000.000.00—不开源
190GPT-5.10.0088.100.000.000.000.00—不开源
191Claude Sonnet 40.0083.800.000.000.000.00—不开源
192Grok 4 Heavy0.0088.900.000.000.000.00—不开源
193GPT-5-Pro0.0089.400.000.000.000.00—不开源
194Claude Sonnet 4.60.0089.9079.600.000.000.00—不开源
195Gemini 3.0 Flash0.0090.4068.700.000.000.00—不开源
196Gemini 3.0 Pro (Preview 11-2025)0.0091.000.000.000.000.00—不开源
197Claude Opus 4.60.0091.310.0097.600.0076.00—不开源
198GPT-5.20.0092.400.000.000.000.00—不开源
199GPT-5.40.0092.800.000.000.000.00—不开源
200GPT-5.2 Pro0.0093.200.000.000.000.00—不开源
201GPT-5.20.0093.200.000.000.000.00—不开源
202Gemini 3.0 Pro (Preview 11-2025)0.0093.800.000.000.000.00—不开源
203Gemini 3.1 Pro Preview0.0094.300.000.000.000.00—不开源
204GPT-5.4 Pro0.0094.400.000.000.000.00—不开源
205Amazon Nova Pro0.000.000.000.000.000.00—不开源
206Claude Sonnet 4.50.0073.700.000.000.0059.00—不开源
207Qwen3-8B0.0062.000.0097.4076.0057.5080B免费商用
208GPT-4.1 mini0.0065.0023.600.0049.600.00—不开源
209Grok 3 mini0.0065.000.000.0040.000.00—不开源
210DeepSeek-R1-Distill-Llama-70B0.0065.200.0094.500.000.00700B免费商用
211Qwen3-4B-Thinking-25070.0065.800.000.000.0055.2040B免费商用
212GLM-4.7-Flash0.0066.000.000.000.000.00310B免费商用
213Gemini 2.5 Flash-Lite0.0066.7027.600.000.0034.30—不开源
214Claude Sonnet 40.0068.000.000.0043.4048.50—不开源
215Claude Sonnet 3.70.0068.000.0082.2023.300.00—不开源
216Magistral-Small-25060.0068.180.000.0070.6855.84240B免费商用
217Qwen3-32B0.0068.400.0097.2081.400.00320B免费商用
218OpenAI o3-mini0.0070.6040.8095.8060.000.00—不开源
219Magistral-Medium-25060.0070.830.000.0073.5959.36—不开源
220Qwen3-235B-A22B0.0071.100.0098.0085.7070.702350B免费商用
221Step30.0073.000.000.000.0067.103210B免费商用
222Qwen3-4B-25070.0062.000.000.000.0035.1040B免费商用
223GLM-4.7-Flash0.0075.2059.200.000.000.00310B免费商用
224ERNIE-4.5-VL-424B-A47B-Base0.0076.800.000.000.0038.804240B免费商用
225Claude Sonnet 3.70.0077.000.000.000.000.00—不开源
226GPT-50.0077.800.000.000.000.00—不开源
227Gemini 2.5 Flash0.0078.3050.000.0088.0041.10—不开源
228OpenAI o3-mini (high)0.0079.7049.3097.9087.0069.50—不开源
229Grok 30.0080.400.000.0084.2070.60—不开源
230Claude Opus 4.10.0080.9074.500.000.0065.00—不开源
231DeepSeek V3.20.0082.4070.200.000.0083.306710B免费商用
232GPT-5.4 nano0.0082.800.000.000.000.00—不开源
233Gemini 2.5 Flash0.0082.8048.900.000.0055.40—不开源
234GLM-4.60.0082.9068.000.000.0084.503550B免费商用
235Gemini-2.5-Pro-Preview-05-060.0083.0063.2098.8092.0077.10—不开源
236OpenAI o30.0083.3069.100.000.000.00—不开源
1
OpenAI o1
MMLU Pro91.04
GPQA Diamond77.30
SWE-bench Verified48.90
MATH-50096.40
AIME 202479.20
LiveCodeBench71.00
不开源
2
Gemini 3.0 Pro (Preview 11-2025)
MMLU Pro90.00
GPQA Diamond91.90
SWE-bench Verified76.20
MATH-5000.00
AIME 20240.00
LiveCodeBench92.00
不开源
3
Claude Opus 4.5
MMLU Pro90.00
GPQA Diamond87.00
SWE-bench Verified80.90
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
4
Claude Opus 4.1
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified74.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
5
M2.1
2300B
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified74.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
6
Claude Sonnet 4.5
MMLU Pro88.00
GPQA Diamond83.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench71.00
不开源
7
Qwen3.5-397B-A17B
397B
MMLU Pro87.80
GPQA Diamond88.40
SWE-bench Verified76.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
8
Qwen3.5-397B-A17B
397B
MMLU Pro87.80
GPQA Diamond88.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.60
免费商用
9
Hunyuan-T1
MMLU Pro87.20
GPQA Diamond69.30
SWE-bench Verified0.00
MATH-50096.20
AIME 202478.20
LiveCodeBench64.90
不开源
10
Grok 4
MMLU Pro87.00
GPQA Diamond87.00
SWE-bench Verified58.60
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
不开源
11
GPT-4.5
MMLU Pro86.10
GPQA Diamond71.40
SWE-bench Verified38.00
MATH-50090.70
AIME 202436.70
LiveCodeBench46.40
不开源
12
Qwen3.5-27B
270B
MMLU Pro86.10
GPQA Diamond85.50
SWE-bench Verified72.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
13
Gemini 2.5-Pro
MMLU Pro86.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
14
Qwen3-Max-Thinking
10000B
MMLU Pro85.70
GPQA Diamond87.40
SWE-bench Verified75.30
MATH-5000.00
AIME 20240.00
LiveCodeBench85.90
不开源
15
OpenAI o3
MMLU Pro85.60
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50098.10
AIME 202491.60
LiveCodeBench75.80
不开源
16
DeepSeek-R1-0528
6710B
MMLU Pro85.00
GPQA Diamond81.00
SWE-bench Verified57.60
MATH-50098.00
AIME 202491.40
LiveCodeBench73.30
免费商用
17
Grok 4.1 Fast
MMLU Pro85.00
GPQA Diamond85.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
不开源
18
DeepSeek V3.2-Exp
6710B
MMLU Pro85.00
GPQA Diamond79.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
免费商用
19
DeepSeek-V3.1 Terminus
6710B
MMLU Pro85.00
GPQA Diamond80.70
SWE-bench Verified68.40
MATH-5000.00
AIME 20240.00
LiveCodeBench74.90
免费商用
20
DeepSeek-V3.1 Terminus
6710B
MMLU Pro85.00
GPQA Diamond79.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
免费商用
21
DeepSeek-V3.1
6710B
MMLU Pro85.00
GPQA Diamond80.10
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.10
LiveCodeBench74.80
免费商用
22
Claude Opus 4
MMLU Pro85.00
GPQA Diamond79.60
SWE-bench Verified72.50
MATH-50098.20
AIME 202476.00
LiveCodeBench56.60
不开源
23
GLM-4.5
3550B
MMLU Pro84.60
GPQA Diamond79.10
SWE-bench Verified64.20
MATH-50098.20
AIME 202491.00
LiveCodeBench72.90
免费商用
24
Kimi K2 Thinking
10400B
MMLU Pro84.60
GPQA Diamond84.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.10
免费商用
25
Qwen3-235B-A22B-Thinking
305B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
免费商用
26
Qwen3-235B-A22B-Thinking-2507
2350B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
免费商用
27
GLM-4.7
3580B
MMLU Pro84.30
GPQA Diamond85.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench84.90
免费商用
28
DeepSeek-R1
6710B
MMLU Pro84.00
GPQA Diamond71.50
SWE-bench Verified49.20
MATH-50097.30
AIME 202479.80
LiveCodeBench65.90
免费商用
29
Claude Sonnet 4
MMLU Pro84.00
GPQA Diamond75.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench66.00
不开源
30
Qwen3 Max (Preview)
MMLU Pro84.00
GPQA Diamond76.00
SWE-bench Verified69.60
MATH-5000.00
AIME 20240.00
LiveCodeBench57.50
不开源
31
DeepSeek V3.2-Exp
6710B
MMLU Pro84.00
GPQA Diamond74.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.00
免费商用
32
DeepSeek-V3.1
6710B
MMLU Pro83.70
GPQA Diamond74.90
SWE-bench Verified66.00
MATH-5000.00
AIME 202466.30
LiveCodeBench56.40
免费商用
33
Intern-S1
2410B
MMLU Pro83.50
GPQA Diamond77.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
34
Qwen3-235B-A22B-2507
2350B
MMLU Pro83.00
GPQA Diamond77.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.80
免费商用
35
GLM-4.6
3550B
MMLU Pro83.00
GPQA Diamond81.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench82.80
免费商用
36
Pangu Pro MoE
719B
MMLU Pro82.60
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50096.80
AIME 202479.20
LiveCodeBench59.60
免费商用
37
Llama 4 Behemoth Instruct
20000B
MMLU Pro82.20
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50095.00
AIME 20240.00
LiveCodeBench49.40
免费商用
38
MiniMax M2
2300B
MMLU Pro82.00
GPQA Diamond78.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.00
免费商用
39
GLM-4.5-Air
1060B
MMLU Pro81.40
GPQA Diamond75.00
SWE-bench Verified57.60
MATH-50098.10
AIME 202489.40
LiveCodeBench70.70
免费商用
40
DeepSeek-V3-0324
6710B
MMLU Pro81.20
GPQA Diamond68.40
SWE-bench Verified38.80
MATH-50094.00
AIME 202459.40
LiveCodeBench49.20
免费商用
41
MiniMax-M1-80k
4560B
MMLU Pro81.10
GPQA Diamond70.00
SWE-bench Verified56.00
MATH-50096.80
AIME 202486.00
LiveCodeBench65.00
免费商用
42
Kimi K2
10000B
MMLU Pro81.10
GPQA Diamond75.10
SWE-bench Verified51.80
MATH-50097.40
AIME 202469.60
LiveCodeBench53.70
免费商用
43
OpenAI o4 - mini
MMLU Pro80.60
GPQA Diamond81.40
SWE-bench Verified68.10
MATH-5000.00
AIME 202493.40
LiveCodeBench0.00
不开源
44
MiniMax-M1-40k
4560B
MMLU Pro80.60
GPQA Diamond69.20
SWE-bench Verified55.60
MATH-50096.00
AIME 202483.30
LiveCodeBench62.30
免费商用
45
GPT-4.1
MMLU Pro80.50
GPQA Diamond66.30
SWE-bench Verified54.60
MATH-50092.80
AIME 202448.10
LiveCodeBench40.50
不开源
46
Llama 4 Maverick Instruct
4000B
MMLU Pro80.50
GPQA Diamond69.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.40
免费商用
47
OpenAI o1-mini
MMLU Pro80.30
GPQA Diamond60.00
SWE-bench Verified0.00
MATH-50090.00
AIME 202463.60
LiveCodeBench52.00
不开源
48
Haiku 4.5
MMLU Pro80.00
GPQA Diamond73.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench62.00
不开源
49
GPT-4o(2025-03-27)
MMLU Pro79.80
GPQA Diamond66.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.80
不开源
50
Gemini 2.0 Pro Experimental
MMLU Pro79.10
GPQA Diamond64.70
SWE-bench Verified0.00
MATH-5000.00
AIME 202436.00
LiveCodeBench0.00
不开源
51
Hunyuan-TurboS
MMLU Pro79.00
GPQA Diamond57.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.00
不开源
52
Pangu Embedded
70B
MMLU Pro79.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50092.40
AIME 202481.90
LiveCodeBench67.10
免费商用
53
GPT OSS 120B
117B
MMLU Pro79.00
GPQA Diamond80.10
SWE-bench Verified60.10
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
54
Kimi K2.5
10000B
MMLU Pro78.50
GPQA Diamond87.60
SWE-bench Verified76.80
MATH-5000.00
AIME 20240.00
LiveCodeBench85.00
免费商用
55
ERNIE-4.5-300B-A47B
3000B
MMLU Pro78.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.40
AIME 202454.80
LiveCodeBench38.80
免费商用
56
Qwen3-30B-A3B-2507
305B
MMLU Pro78.40
GPQA Diamond70.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.20
免费商用
57
Claude 3.5 Sonnet New
MMLU Pro78.00
GPQA Diamond65.00
SWE-bench Verified49.00
MATH-50078.00
AIME 202416.00
LiveCodeBench38.70
不开源
58
GLM-4.6
3550B
MMLU Pro78.00
GPQA Diamond63.00
SWE-bench Verified68.00
MATH-5000.00
AIME 20240.00
LiveCodeBench56.00
免费商用
59
GPT-5-mini
MMLU Pro78.00
GPQA Diamond69.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.00
不开源
60
GPT-4o
MMLU Pro77.90
GPQA Diamond70.10
SWE-bench Verified31.00
MATH-50075.90
AIME 20249.30
LiveCodeBench35.10
不开源
61
GPT-4o(2024-11-20)
MMLU Pro77.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
62
Claude 3.5 Sonnet
MMLU Pro77.64
GPQA Diamond59.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
63
Gemini 2.0 Flash Experimental
MMLU Pro76.24
GPQA Diamond65.20
SWE-bench Verified21.40
MATH-5000.00
AIME 20240.00
LiveCodeBench29.10
不开源
64
Gemini 1.5 Pro
MMLU Pro76.10
GPQA Diamond53.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
65
Qwen2.5-Max
MMLU Pro76.10
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
66
Haiku 4.5
MMLU Pro76.00
GPQA Diamond60.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.00
不开源
67
QwQ-32B
325B
MMLU Pro76.00
GPQA Diamond58.00
SWE-bench Verified0.00
MATH-50091.00
AIME 202479.50
LiveCodeBench0.00
免费商用
68
DeepSeek-V3
6810B
MMLU Pro75.90
GPQA Diamond59.10
SWE-bench Verified0.00
MATH-50087.80
AIME 202439.00
LiveCodeBench34.60
免费商用
69
Grok 2
2690B
MMLU Pro75.50
GPQA Diamond56.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
70
Llama 4 Scout Instruct
1090B
MMLU Pro74.30
GPQA Diamond57.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.80
免费商用
71
GPT OSS 20B
210B
MMLU Pro74.00
GPQA Diamond71.50
SWE-bench Verified34.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
72
Llama3.1-405B Instruct
4050B
MMLU Pro73.40
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench30.20
免费商用
73
Qwen3-235B-A22B
2350B
MMLU Pro72.90
GPQA Diamond71.10
SWE-bench Verified34.40
MATH-50096.20
AIME 202485.70
LiveCodeBench70.70
免费商用
74
Qwen3-8B
80B
MMLU Pro72.50
GPQA Diamond39.30
SWE-bench Verified0.00
MATH-50087.40
AIME 202479.40
LiveCodeBench61.80
免费商用
75
GLM-4-9B-Chat
90B
MMLU Pro72.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202476.40
LiveCodeBench51.80
免费商用
76
Gemini 2.0 Flash-Lite
MMLU Pro71.60
GPQA Diamond51.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench28.90
不开源
77
QwQ-32B-Preview
320B
MMLU Pro70.97
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50090.60
AIME 202450.00
LiveCodeBench0.00
免费商用
78
Phi 4 - 14B
140B
MMLU Pro70.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不可商用
79
Qwen2.5-32B
320B
MMLU Pro69.23
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.20
免费商用
80
Qwen3-30B-A3B
305B
MMLU Pro69.10
GPQA Diamond54.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench29.00
免费商用
81
Mistral-Small-3.2
240B
MMLU Pro69.06
GPQA Diamond46.13
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
82
Llama3.3-70B-Instruct
700B
MMLU Pro68.90
GPQA Diamond50.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
免费商用
83
Claude3-Opus
MMLU Pro68.45
GPQA Diamond50.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
84
Gemma 3 - 27B (IT)
270B
MMLU Pro67.50
GPQA Diamond42.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202425.30
LiveCodeBench29.70
免费商用
85
Hunyuan-A13B-Instruct
800B
MMLU Pro67.23
GPQA Diamond71.20
SWE-bench Verified0.00
MATH-5000.00
AIME 202487.30
LiveCodeBench63.90
免费商用
86
Mistral-Small-3.1-24B-Instruct-2503
240B
MMLU Pro66.76
GPQA Diamond45.96
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
87
Llama3.1-70B-Instruct
700B
MMLU Pro66.40
GPQA Diamond48.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
免费商用
88
Qwen3-Next
800B
MMLU Pro66.05
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench56.60
免费商用
89
Claude 3.5 Haiku
MMLU Pro65.00
GPQA Diamond41.60
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
90
Qwen2.5-14B
140B
MMLU Pro63.69
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
91
Llama 4 Maverick
4000B
MMLU Pro62.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
92
GPT-4o mini
MMLU Pro61.70
GPQA Diamond41.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
93
Llama3.1-405B
4050B
MMLU Pro61.60
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
94
Gemma 3 - 12B (IT)
120B
MMLU Pro60.60
GPQA Diamond40.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench24.60
免费商用
95
Llama 4 Scout
1090B
MMLU Pro58.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
96
Qwen2.5-72B
727B
MMLU Pro58.10
GPQA Diamond45.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
97
Claude3-Sonnet
MMLU Pro56.80
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
98
Gemma2-27B
270B
MMLU Pro56.54
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
99
Mixtral-8x22B-Instruct-v0.1
1410B
MMLU Pro56.33
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
100
Llama3-70B-Instruct
700B
MMLU Pro56.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
101
Phi-4-mini-instruct (3.8B)
38B
MMLU Pro52.80
GPQA Diamond36.00
SWE-bench Verified0.00
MATH-50071.80
AIME 202410.00
LiveCodeBench0.00
免费商用
102
Llama3-70B
700B
MMLU Pro52.78
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
103
Llama3.1-70B
700B
MMLU Pro52.47
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
104
Grok-1.5
MMLU Pro51.00
GPQA Diamond35.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
105
C4AI Aya Vision 32B
320B
MMLU Pro47.16
GPQA Diamond33.84
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不可商用
106
Qwen2.5-7B
70B
MMLU Pro45.00
GPQA Diamond36.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
107
Gemma 2 - 9B
90B
MMLU Pro44.70
GPQA Diamond32.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
108
Llama3.1-8B-Instruct
80B
MMLU Pro44.00
GPQA Diamond26.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
109
Moonlight-16B-A3B-Instruct
160B
MMLU Pro42.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
110
Llama3.1-8B
80B
MMLU Pro35.40
GPQA Diamond25.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
111
Qwen2.5-3B
30B
MMLU Pro34.60
GPQA Diamond24.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
112
Mistral-7B-Instruct-v0.3
70B
MMLU Pro30.90
GPQA Diamond24.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
113
Llama-3.2-3B
32B
MMLU Pro25.00
GPQA Diamond26.60
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
114
GPT-5.1-Codex-Max
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified76.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
115
Qwen3.5-397B-A17B
397B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified76.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
116
GPT-5.1
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified76.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
117
o3-pro
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified75.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
118
GPT-5 Codex
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified74.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
119
Step 3.5 Flash
1960B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified74.40
MATH-5000.00
AIME 20240.00
LiveCodeBench86.40
免费商用
120
GLM-4.7
3580B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
121
Grok 4 Heavy
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
122
Haiku 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
123
DeepSeek V3.2
6710B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.10
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
124
Claude Sonnet 4
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified72.70
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
125
Grok 4 Code
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified72.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
126
Kimi K2 Thinking
10400B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified71.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
127
Hunyuan-7B
70B
MMLU Pro0.00
GPQA Diamond60.10
SWE-bench Verified0.00
MATH-50093.70
AIME 202481.10
LiveCodeBench57.00
免费商用
128
Grok Code Fast 1
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
129
Claude Sonnet 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified77.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
130
Claude Opus 4.1
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified79.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
131
GPT-5.2
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified80.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
132
MiniMax M2.5
2290B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified80.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
133
Claude Sonnet 4
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified80.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
134
Gemini 3.1 Pro Preview
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified80.60
MATH-5000.00
AIME 20240.00
LiveCodeBench2887.00
不开源
135
Claude Opus 4.6
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified80.84
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
136
Claude Sonnet 5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified82.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
137
Claude Sonnet 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified82.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
138
GPT-5-mini
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
139
Grok 3.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
140
Phi-4-instruct (reasoning-trained)
38B
MMLU Pro0.00
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-50090.40
AIME 202450.00
LiveCodeBench0.00
不开源
141
DeepSeek-R1-Distill-Qwen-7B
70B
MMLU Pro0.00
GPQA Diamond49.50
SWE-bench Verified0.00
MATH-50091.40
AIME 202453.30
LiveCodeBench0.00
免费商用
142
GPT-4.1 nano
MMLU Pro0.00
GPQA Diamond50.30
SWE-bench Verified0.00
MATH-5000.00
AIME 202429.40
LiveCodeBench0.00
不开源
143
Qwen3-32B
320B
MMLU Pro0.00
GPQA Diamond53.30
SWE-bench Verified0.00
MATH-5000.00
AIME 202481.40
LiveCodeBench65.70
免费商用
144
Qwen3-30B-A3B-2507
305B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified22.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
145
Codestral
220B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench31.50
不可商用
146
Codestral 25.01
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench37.90
不开源
147
QwQ-Max-Preview
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.60
免费商用
148
Kimi-k1.6-IOI
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.90
不开源
149
OpenAI o3-mini (medium)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.40
不开源
150
Kimi-k1.6-IOI-high
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench73.80
不开源
151
Gemini 2.5 Pro Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.40
不开源
152
Qwen3.5-397B-A17B
397B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.60
免费商用
153
Claude Opus 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench87.00
不开源
154
Gemini 2.5 Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench87.60
不开源
155
GPT OSS 20B
210B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
免费商用
156
GPT OSS 120B
117B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.60
LiveCodeBench0.00
免费商用
157
OpenAI o4 - mini
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202498.70
LiveCodeBench0.00
不开源
158
Kimi k1.5 (Short-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50094.60
AIME 20240.00
LiveCodeBench0.00
不开源
159
Kimi k1.5 (Long-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.20
AIME 20240.00
LiveCodeBench0.00
不开源
160
Qwen3-Coder-Next
80B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
161
Devstral Small 1.0
240B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified46.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
162
Qwen3-Coder-Flash
305B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified51.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
163
Devstral Small 1.1
240B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified53.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
164
Gemini 2.5 Flash-Preview-09-2025
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified54.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
165
Haiku 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified60.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
166
Devstral Medium
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified61.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
167
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified62.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
168
Qwen3-Coder-480B-A35B
4800B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified67.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
169
DeepSeek V3.2-Exp
6710B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified67.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
170
Kimi K2 0905
10000B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
171
Kimi K2 0905
10000B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
172
MiniMax M2
2300B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
173
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
174
GPT-5.1 Codex
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.40
MATH-5000.00
AIME 20240.00
LiveCodeBench85.50
不开源
175
GPT-5-Pro
MMLU Pro0.00
GPQA Diamond88.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
176
o3-pro
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.00
LiveCodeBench0.00
不开源
177
Gemini 2.5 Pro Experimental 03-25
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified63.80
MATH-5000.00
AIME 202492.00
LiveCodeBench70.40
不开源
178
Grok-3 mini - Reasoning
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
不开源
179
Grok-3 - Reasoning Beta
MMLU Pro0.00
GPQA Diamond84.60
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.30
LiveCodeBench79.40
不开源
180
Claude Sonnet 3.7-64K Extended Thinking
MMLU Pro0.00
GPQA Diamond84.80
SWE-bench Verified0.00
MATH-50096.20
AIME 202480.00
LiveCodeBench0.00
不开源
181
MiniMax M2.5
2290B
MMLU Pro0.00
GPQA Diamond85.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
182
Grok 4 Fast
MMLU Pro0.00
GPQA Diamond85.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
不开源
183
GPT-5
MMLU Pro0.00
GPQA Diamond85.70
SWE-bench Verified72.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
184
GLM-5
7440B
MMLU Pro0.00
GPQA Diamond86.00
SWE-bench Verified77.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
185
Gemini 2.5-Pro
MMLU Pro0.00
GPQA Diamond86.40
SWE-bench Verified67.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
186
GPT-5
MMLU Pro0.00
GPQA Diamond87.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
187
GPT-5.4 mini
MMLU Pro0.00
GPQA Diamond88.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
188
GPT-5.1
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
189
GPT-5.1
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified76.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
190
GPT-5.1
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
191
Claude Sonnet 4
MMLU Pro0.00
GPQA Diamond83.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
192
Grok 4 Heavy
MMLU Pro0.00
GPQA Diamond88.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
193
GPT-5-Pro
MMLU Pro0.00
GPQA Diamond89.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
194
Claude Sonnet 4.6
MMLU Pro0.00
GPQA Diamond89.90
SWE-bench Verified79.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
195
Gemini 3.0 Flash
MMLU Pro0.00
GPQA Diamond90.40
SWE-bench Verified68.70
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
196
Gemini 3.0 Pro (Preview 11-2025)
MMLU Pro0.00
GPQA Diamond91.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
197
Claude Opus 4.6
MMLU Pro0.00
GPQA Diamond91.31
SWE-bench Verified0.00
MATH-50097.60
AIME 20240.00
LiveCodeBench76.00
不开源
198
GPT-5.2
MMLU Pro0.00
GPQA Diamond92.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
199
GPT-5.4
MMLU Pro0.00
GPQA Diamond92.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
200
GPT-5.2 Pro
MMLU Pro0.00
GPQA Diamond93.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
201
GPT-5.2
MMLU Pro0.00
GPQA Diamond93.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
202
Gemini 3.0 Pro (Preview 11-2025)
MMLU Pro0.00
GPQA Diamond93.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
203
Gemini 3.1 Pro Preview
MMLU Pro0.00
GPQA Diamond94.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
204
GPT-5.4 Pro
MMLU Pro0.00
GPQA Diamond94.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
205
Amazon Nova Pro
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
206
Claude Sonnet 4.5
MMLU Pro0.00
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench59.00
不开源
207
Qwen3-8B
80B
MMLU Pro0.00
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-50097.40
AIME 202476.00
LiveCodeBench57.50
免费商用
208
GPT-4.1 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified23.60
MATH-5000.00
AIME 202449.60
LiveCodeBench0.00
不开源
209
Grok 3 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202440.00
LiveCodeBench0.00
不开源
210
DeepSeek-R1-Distill-Llama-70B
700B
MMLU Pro0.00
GPQA Diamond65.20
SWE-bench Verified0.00
MATH-50094.50
AIME 20240.00
LiveCodeBench0.00
免费商用
211
Qwen3-4B-Thinking-2507
40B
MMLU Pro0.00
GPQA Diamond65.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.20
免费商用
212
GLM-4.7-Flash
310B
MMLU Pro0.00
GPQA Diamond66.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
213
Gemini 2.5 Flash-Lite
MMLU Pro0.00
GPQA Diamond66.70
SWE-bench Verified27.60
MATH-5000.00
AIME 20240.00
LiveCodeBench34.30
不开源
214
Claude Sonnet 4
MMLU Pro0.00
GPQA Diamond68.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202443.40
LiveCodeBench48.50
不开源
215
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond68.00
SWE-bench Verified0.00
MATH-50082.20
AIME 202423.30
LiveCodeBench0.00
不开源
216
Magistral-Small-2506
240B
MMLU Pro0.00
GPQA Diamond68.18
SWE-bench Verified0.00
MATH-5000.00
AIME 202470.68
LiveCodeBench55.84
免费商用
217
Qwen3-32B
320B
MMLU Pro0.00
GPQA Diamond68.40
SWE-bench Verified0.00
MATH-50097.20
AIME 202481.40
LiveCodeBench0.00
免费商用
218
OpenAI o3-mini
MMLU Pro0.00
GPQA Diamond70.60
SWE-bench Verified40.80
MATH-50095.80
AIME 202460.00
LiveCodeBench0.00
不开源
219
Magistral-Medium-2506
MMLU Pro0.00
GPQA Diamond70.83
SWE-bench Verified0.00
MATH-5000.00
AIME 202473.59
LiveCodeBench59.36
不开源
220
Qwen3-235B-A22B
2350B
MMLU Pro0.00
GPQA Diamond71.10
SWE-bench Verified0.00
MATH-50098.00
AIME 202485.70
LiveCodeBench70.70
免费商用
221
Step3
3210B
MMLU Pro0.00
GPQA Diamond73.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.10
免费商用
222
Qwen3-4B-2507
40B
MMLU Pro0.00
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.10
免费商用
223
GLM-4.7-Flash
310B
MMLU Pro0.00
GPQA Diamond75.20
SWE-bench Verified59.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
224
ERNIE-4.5-VL-424B-A47B-Base
4240B
MMLU Pro0.00
GPQA Diamond76.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench38.80
免费商用
225
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond77.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
226
GPT-5
MMLU Pro0.00
GPQA Diamond77.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
227
Gemini 2.5 Flash
MMLU Pro0.00
GPQA Diamond78.30
SWE-bench Verified50.00
MATH-5000.00
AIME 202488.00
LiveCodeBench41.10
不开源
228
OpenAI o3-mini (high)
MMLU Pro0.00
GPQA Diamond79.70
SWE-bench Verified49.30
MATH-50097.90
AIME 202487.00
LiveCodeBench69.50
不开源
229
Grok 3
MMLU Pro0.00
GPQA Diamond80.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202484.20
LiveCodeBench70.60
不开源
230
Claude Opus 4.1
MMLU Pro0.00
GPQA Diamond80.90
SWE-bench Verified74.50
MATH-5000.00
AIME 20240.00
LiveCodeBench65.00
不开源
231
DeepSeek V3.2
6710B
MMLU Pro0.00
GPQA Diamond82.40
SWE-bench Verified70.20
MATH-5000.00
AIME 20240.00
LiveCodeBench83.30
免费商用
232
GPT-5.4 nano
MMLU Pro0.00
GPQA Diamond82.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
233
Gemini 2.5 Flash
MMLU Pro0.00
GPQA Diamond82.80
SWE-bench Verified48.90
MATH-5000.00
AIME 20240.00
LiveCodeBench55.40
不开源
234
GLM-4.6
3550B
MMLU Pro0.00
GPQA Diamond82.90
SWE-bench Verified68.00
MATH-5000.00
AIME 20240.00
LiveCodeBench84.50
免费商用
235
Gemini-2.5-Pro-Preview-05-06
MMLU Pro0.00
GPQA Diamond83.00
SWE-bench Verified63.20
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
236
OpenAI o3
MMLU Pro0.00
GPQA Diamond83.30
SWE-bench Verified69.10
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源