DataLearner 标志DataLearnerAI
AI技术博客
大模型评测排行
大模型评测基准
AI大模型大全
AI资源仓库
AI工具导航

加载中...

DataLearner 标志DataLearner AI

专注大模型评测、数据资源与实践教学的知识平台,持续更新可落地的 AI 能力图谱。

产品

  • 评测榜单
  • 模型对比
  • 数据资源

资源

  • 部署教程
  • 原创内容
  • 工具导航

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner 持续整合行业数据与案例,为科研、企业与开发者提供可靠的大模型情报与实践指南。

隐私政策服务条款

大模型评测基准与性能对比

快速查看大模型在各项评测基准上的表现,包括MMLU Pro、HLE、SWE-Bench等多个标准数据集,帮助开发者和用户了解不同大模型在通用知识、编程能力、推理能力等方面的表现。用户可以选择自定义模型与评测基准进行对比,快速获取不同模型在实际应用中的优劣势。

各个评测基准的详细介绍可见:LLM 评测基准列表与介绍

数据更新于: 2025/11/08 22:10:24

评测切换

在这里切换评测,图表和表格会同步更新

筛选条件

排名
模型
MMLU Pro
GPQA Diamond
SWE-bench Verified
MATH-500
AIME 2024
LiveCodeBench
参数(亿)
开源情况
1
OpenAI o1
91.0477.3048.9096.4079.2071.00—闭源
2
Gemini 3.0 Pro (Preview 11-2025)thinking
90.0091.9076.200.000.0092.00—闭源
3
Claude Opus 4.5thinking
90.0087.0080.900.000.000.00—闭源
4
Claude Opus 4.1thinking
88.0081.0074.500.000.000.00—闭源
5
M2.1thinking
88.0081.0074.000.000.000.002300B免费商用
6
Claude Sonnet 4.5thinking
88.0083.400.000.000.0071.00—闭源
7
Hunyuan-T1
87.2069.300.0096.2078.2064.90—闭源
8
Grok 4thinking
87.0087.0058.600.000.0082.00—闭源
9
GPT-4.5
86.1071.4038.0090.7036.7046.40—闭源
10
Gemini 2.5-Pro
86.000.000.0098.8092.0077.10—闭源
11
Qwen3-Max-Thinkingthinking
85.7087.4075.300.000.0085.9010000B闭源
12
OpenAI o3
85.600.000.0098.1091.6075.80—闭源
13
DeepSeek V3.2-Expthinking
85.0079.900.000.000.0074.106710B免费商用
14
Grok 4.1 Fastthinking
85.0085.000.000.000.0082.00—闭源
15
DeepSeek-V3.1 Terminus
85.0080.7068.400.000.0074.906710B免费商用
16
DeepSeek-V3.1 Terminusthinking
85.0079.000.000.000.0080.006710B免费商用
17
DeepSeek-V3.1thinking
85.0080.100.000.0093.1074.806710B免费商用
18
DeepSeek-R1-0528thinking
85.0081.0057.6098.0091.4073.306710B免费商用
19
Claude Opus 4
85.0079.6072.5098.2076.0056.60—闭源
20
GLM-4.5thinking
84.6079.1064.2098.2091.0072.903550B免费商用
21
Kimi K2 Thinkingthinking
84.6084.500.000.000.0083.1010400B免费商用
22
Qwen3-235B-A22B-Thinking-2507thinking
84.4081.100.000.000.0074.102350B免费商用
23
Qwen3-235B-A22B-Thinkingthinking
84.4081.100.000.000.0074.10305B免费商用
24
GLM-4.7thinking
84.3085.700.000.000.0084.903580B免费商用
25
DeepSeek-R1
84.0071.5049.2097.3079.8065.906710B免费商用
26
Claude Sonnet 4thinking
84.0075.400.000.000.0066.00—闭源
27
Qwen3 Max (Preview)
84.0076.0069.600.000.0057.50—闭源
28
DeepSeek V3.2-Exp
84.0074.000.000.000.0055.006710B免费商用
29
DeepSeek-V3.1
83.7074.9066.000.0066.3056.406710B免费商用
30
Intern-S1
83.5077.300.000.000.000.002410B免费商用
31
Qwen3-235B-A22B-2507
83.0077.500.000.000.0051.802350B免费商用
32
GLM-4.6thinking
83.0081.000.000.000.0082.803550B免费商用
33
Pangu Pro MoE
82.6073.700.0096.8079.2059.60719B免费商用
34
Llama 4 Behemoth Instruct
82.2073.700.0095.000.0049.4020000B免费商用
35
MiniMax M2thinking
82.0078.000.000.000.0083.002300B免费商用
36
GLM-4.5-Airthinking
81.4075.0057.6098.1089.4070.701060B免费商用
37
DeepSeek-V3-0324
81.2068.4038.8094.0059.4049.206710B免费商用
38
MiniMax-M1-80k
81.1070.0056.0096.8086.0065.004560B免费商用
39
Kimi K2
81.1075.1051.8097.4069.6053.7010000B免费商用
40
OpenAI o4 - minithinking
80.6081.4068.100.0093.400.00—闭源
41
MiniMax-M1-40k
80.6069.2055.6096.0083.3062.304560B免费商用
42
GPT-4.1
80.5066.3054.6092.8048.1040.50—闭源
43
Llama 4 Maverick Instruct
80.5069.800.000.000.0043.404000B免费商用
44
OpenAI o1-mini
80.3060.000.0090.0063.6052.00—闭源
45
Haiku 4.5
80.0060.5060.600.000.0051.00—闭源
46
GPT-4o(2025-03-27)
79.8066.900.000.000.0035.80—闭源
47
Gemini 2.0 Pro Experimental
79.1064.700.000.0036.000.00—闭源
48
Hunyuan-TurboS
79.0057.500.000.000.0032.00—闭源
49
Pangu Embedded
79.000.000.0092.4081.9067.1070B免费商用
50
GPT OSS 120Bthinking
79.0080.1060.100.000.000.00117B免费商用
51
Kimi K2.5thinking
78.5087.6076.800.000.0085.0010000B免费商用
52
ERNIE-4.5-300B-A47B
78.400.000.0096.4054.8038.803000B免费商用
53
Qwen3-30B-A3B-2507
78.4070.400.000.000.0043.20305B免费商用
54
Claude 3.5 Sonnet New
78.0065.0049.0078.0016.0038.70—闭源
55
GLM-4.6
78.0063.0068.000.000.0056.003550B免费商用
56
GPT-5-minithinking
78.0069.000.000.000.0055.00—闭源
57
GPT-4o
77.9070.1031.0075.909.3035.10—闭源
58
GPT-4o(2024-11-20)
77.900.000.000.000.000.00—闭源
59
Claude 3.5 Sonnet
77.6459.400.000.000.000.00—闭源
60
Gemini 2.0 Flash Experimental
76.2465.2021.400.000.0029.10—闭源
61
Gemini 1.5 Pro
76.1053.500.000.000.000.00—闭源
62
Qwen2.5-Max
76.100.000.000.000.000.00—闭源
63
QwQ-32B
76.0058.000.0091.0079.500.00325B免费商用
64
Haiku 4.5thinking
76.0073.300.000.000.0062.00—闭源
65
DeepSeek-V3
75.9059.100.0087.8039.0034.606810B免费商用
66
Grok 2
75.5056.000.000.000.000.002690B免费商用
67
Llama 4 Scout Instruct
74.3057.200.000.000.0032.801090B免费商用
68
GPT OSS 20Bthinking
74.0071.5034.000.000.000.00210B免费商用
69
Llama3.1-405B Instruct
73.4049.000.000.000.0030.204050B免费商用
70
Qwen3-235B-A22B
72.9071.1034.4096.2085.7070.702350B免费商用
71
Qwen3-8B
72.5039.300.0087.4079.4061.8080B免费商用
72
GLM-4-9B-Chat
72.400.000.000.0076.4051.8090B免费商用
73
Gemini 2.0 Flash-Lite
71.6051.500.000.000.0028.90—闭源
74
QwQ-32B-Preview
70.970.000.0090.6050.000.00320B免费商用
75
Phi 4 - 14B
70.400.000.000.000.000.00140B不可商用
76
Qwen2.5-32B
69.230.000.000.000.0051.20320B免费商用
77
Qwen3-30B-A3B
69.1054.800.000.000.0029.00305B免费商用
78
Mistral-Small-3.2
69.0646.130.000.000.000.00240B免费商用
79
Llama3.3-70B-Instruct
68.9050.500.000.000.0033.30700B免费商用
80
Claude3-Opus
68.4550.400.000.000.000.00—闭源
81
Gemma 3 - 27B (IT)
67.5042.400.000.0025.3029.70270B免费商用
82
Hunyuan-A13B-Instruct
67.2371.200.000.0087.3063.90800B免费商用
83
Mistral-Small-3.1-24B-Instruct-2503
66.7645.960.000.000.000.00240B免费商用
84
Llama3.1-70B-Instruct
66.4048.000.000.000.0033.30700B免费商用
85
Qwen3-Next
66.050.000.000.000.0056.60800B免费商用
86
Claude 3.5 Haiku
65.0041.600.000.000.000.00—闭源
87
Qwen2.5-14B
63.690.000.000.000.000.00140B免费商用
88
Llama 4 Maverick
62.900.000.000.000.000.004000B免费商用
89
GPT-4o mini
61.7041.100.000.000.000.00—闭源
90
Llama3.1-405B
61.600.000.000.000.000.004050B免费商用
91
Gemma 3 - 12B (IT)
60.6040.900.000.000.0024.60120B免费商用
92
Llama 4 Scout
58.200.000.000.000.000.001090B免费商用
93
Qwen2.5-72B
58.1045.900.000.000.000.00727B免费商用
94
Claude3-Sonnet
56.800.000.000.000.000.00—闭源
95
Gemma2-27B
56.540.000.000.000.000.00270B免费商用
96
Mixtral-8x22B-Instruct-v0.1
56.330.000.000.000.000.001410B免费商用
97
Llama3-70B-Instruct
56.200.000.000.000.000.00700B免费商用
98
Phi-4-mini-instruct (3.8B)
52.8036.000.0071.8010.000.0038B免费商用
99
Llama3-70B
52.780.000.000.000.000.00700B免费商用
100
Llama3.1-70B
52.470.000.000.000.000.00700B免费商用
101
Grok-1.5
51.0035.900.000.000.000.00—闭源
102
C4AI Aya Vision 32B
47.1633.840.000.000.000.00320B不可商用
103
Qwen2.5-7B
45.0036.400.000.000.000.0070B免费商用
104
Gemma 2 - 9B
44.7032.800.000.000.000.0090B免费商用
105
Llama3.1-8B-Instruct
44.0026.300.000.000.000.0080B免费商用
106
Moonlight-16B-A3B-Instruct
42.400.000.000.000.000.00160B免费商用
107
Llama3.1-8B
35.4025.800.000.000.000.0080B免费商用
108
Qwen2.5-3B
34.6024.300.000.000.000.0030B免费商用
109
Mistral-7B-Instruct-v0.3
30.9024.700.000.000.000.0070B免费商用
110
Llama-3.2-3B
25.0026.600.000.000.000.0032B免费商用
111
o3-prohigh
0.000.0075.000.000.000.00—闭源
112
GPT-5.1 Codexhigh + 使用工具
0.000.0070.400.000.0085.50—闭源
113
GPT-5 Codexhigh
0.000.0074.500.000.000.00—闭源
114
GLM-4.7thinking + 使用工具
0.000.0073.800.000.000.003580B免费商用
115
Grok 4 Heavyparallel_thinking + 使用工具
0.000.0073.500.000.000.00—闭源
116
Haiku 4.5thinking + 使用工具
0.000.0073.300.000.000.00—闭源
117
DeepSeek V3.2thinking + 使用工具
0.000.0073.100.000.000.006710B免费商用
118
Claude Sonnet 4thinking + 使用工具
0.000.0072.700.000.000.00—闭源
119
Grok 4 Code
0.000.0072.000.000.000.00—闭源
120
Kimi K2 Thinkingthinking + 使用工具
0.000.0071.300.000.000.0010400B免费商用
121
Grok Code Fast 1thinking
0.000.0070.800.000.000.00—闭源
122
Hunyuan-7B
0.0060.100.0093.7081.1057.0070B免费商用
123
GPT-5.1-Codex-Maxhigh + 使用工具
0.000.0076.800.000.000.00—闭源
124
Claude Sonnet 4.5thinking + 使用工具
0.000.0077.200.000.000.00—闭源
125
Claude Opus 4.1parallel_thinking + 使用工具
0.000.0079.400.000.000.00—闭源
126
Claude Sonnet 4parallel_thinking + 使用工具
0.000.0080.200.000.000.00—闭源
127
Claude Sonnet 4.5parallel_thinking + 使用工具
0.000.0082.000.000.000.00—闭源
128
GPT-5-mini
0.000.000.000.000.000.00—闭源
129
Grok 3.5
0.000.000.000.000.000.00—闭源
130
Phi-4-instruct (reasoning-trained)
0.0049.000.0090.4050.000.0038B闭源
131
DeepSeek-R1-Distill-Qwen-7B
0.0049.500.0091.4053.300.0070B免费商用
132
GPT-4.1 nano
0.0050.300.000.0029.400.00—闭源
133
Qwen3-32B
0.0053.300.000.0081.4065.70320B免费商用
134
Codestral
0.000.000.000.000.0031.50220B不可商用
135
Kimi k1.5 (Short-CoT)
0.000.000.0094.600.000.00—闭源
136
Codestral 25.01
0.000.000.000.000.0037.90—闭源
137
QwQ-Max-Preview
0.000.000.000.000.0065.60—免费商用
138
Kimi-k1.6-IOI
0.000.000.000.000.0065.90—闭源
139
OpenAI o3-mini (medium)
0.000.000.000.000.0067.40—闭源
140
Kimi-k1.6-IOI-high
0.000.000.000.000.0073.80—闭源
141
Gemini 2.5 Pro Deep Think
0.000.000.000.000.0080.40—闭源
142
Claude Opus 4.5thinking + 使用工具
0.000.000.000.000.0087.00—闭源
143
Gemini 2.5 Deep Thinkdeeper_thinking
0.000.000.000.000.0087.60—闭源
144
GPT OSS 20Bthinking + 使用工具
0.000.000.000.0096.000.00210B免费商用
145
GPT OSS 120Bthinking + 使用工具
0.000.000.000.0096.600.00117B免费商用
146
OpenAI o4 - minithinking + 使用工具
0.000.000.000.0098.700.00—闭源
147
MiniMax M2thinking + 使用工具
0.000.0069.400.000.000.002300B免费商用
148
Kimi k1.5 (Long-CoT)
0.000.000.0096.200.000.00—闭源
149
Qwen3-30B-A3B-2507thinking
0.000.0022.000.000.000.00305B免费商用
150
Devstral Small 1.0
0.000.0046.800.000.000.00240B免费商用
151
Qwen3-Coder-Flash
0.000.0051.600.000.000.00305B免费商用
152
Devstral Small 1.1
0.000.0053.600.000.000.00240B免费商用
153
Gemini 2.5 Flash-Preview-09-2025thinking
0.000.0054.000.000.000.00—闭源
154
Devstral Medium
0.000.0061.600.000.000.00—闭源
155
Qwen3-Coder-480B-A35B
0.000.0067.000.000.000.004800B免费商用
156
DeepSeek V3.2-Expthinking + 使用工具
0.000.0067.800.000.000.006710B免费商用
157
Kimi K2 0905thinking + 使用工具
0.000.0069.200.000.000.0010000B免费商用
158
Kimi K2 0905
0.000.0069.200.000.000.0010000B免费商用
159
Gemini 2.5-Prothinking
0.0086.4067.200.000.000.00—闭源
160
Gemini 2.5 Flashthinking
0.0082.8048.900.000.0055.40—闭源
161
GLM-4.6thinking + 使用工具
0.0082.9068.000.000.0084.503550B免费商用
162
Gemini-2.5-Pro-Preview-05-06
0.0083.0063.2098.8092.0077.10—闭源
163
OpenAI o3thinking
0.0083.3069.100.000.000.00—闭源
164
Claude Sonnet 4deeper_thinking + 使用工具
0.0083.800.000.000.000.00—闭源
165
o3-pro
0.0084.000.000.0093.000.00—闭源
166
Gemini 2.5 Pro Experimental 03-25
0.0084.0063.800.0092.0070.40—闭源
167
Grok-3 mini - Reasoning
0.0084.000.000.0096.000.00—闭源
168
Grok-3 - Reasoning Beta
0.0084.600.000.0093.3079.40—闭源
169
Claude Sonnet 3.7-64K Extended Thinking
0.0084.800.0096.2080.000.00—闭源
170
Grok 4 Fastthinking
0.0085.700.000.000.0080.00—闭源
171
GPT-5high
0.0085.7072.800.000.000.00—闭源
172
DeepSeek V3.2thinking
0.0082.400.000.000.0083.306710B免费商用
173
GPT-5thinking + 使用工具
0.0087.300.000.000.000.00—闭源
174
GPT-5.1high
0.0088.1076.300.000.000.00—闭源
175
GPT-5.1thinking
0.0088.100.000.000.000.00—闭源
176
GPT-5-Prothinking
0.0088.400.000.000.000.00—闭源
177
Grok 4 Heavyparallel_thinking
0.0088.900.000.000.000.00—闭源
178
GPT-5-Prothinking + 使用工具
0.0089.400.000.000.000.00—闭源
179
Gemini 3.0 Flashthinking
0.0090.4068.700.000.000.00—闭源
180
GPT-5.2thinking
0.0092.4080.000.000.000.00—闭源
181
GPT-5.2 Prothinking
0.0093.200.000.000.000.00—闭源
182
Gemini 3.0 Pro (Preview 11-2025)parallel_thinking
0.0093.800.000.000.000.00—闭源
183
Amazon Nova Pro
0.000.000.000.000.000.00—闭源
184
OpenAI o3-minithinking
0.0070.6040.8095.8060.000.00—闭源
185
Qwen3-8Bthinking
0.0062.000.0097.4076.0057.5080B免费商用
186
GPT-4.1 mini
0.0065.0023.600.0049.600.00—闭源
187
Grok 3 mini
0.0065.000.000.0040.000.00—闭源
188
DeepSeek-R1-Distill-Llama-70B
0.0065.200.0094.500.000.00700B免费商用
189
Qwen3-4B-Thinking-2507thinking
0.0065.800.000.000.0055.2040B免费商用
190
GLM-4.7-Flash
0.0066.000.000.000.000.00310B免费商用
191
Gemini 2.5 Flash-Lite
0.0066.7027.600.000.0034.30—闭源
192
Claude Sonnet 4
0.0068.000.000.0043.4048.50—闭源
193
Claude Sonnet 3.7
0.0068.0070.3082.2023.300.00—闭源
194
Magistral-Small-2506
0.0068.180.000.0070.6855.84240B免费商用
195
Qwen3-32Bthinking
0.0068.400.0097.2081.400.00320B免费商用
196
Qwen3-4B-2507
0.0062.000.000.000.0035.1040B免费商用
197
Magistral-Medium-2506
0.0070.830.000.0073.5959.36—闭源
198
Qwen3-235B-A22Bthinking
0.0071.100.0098.0085.7070.702350B免费商用
199
Step3
0.0073.000.000.000.0067.103210B免费商用
200
Claude Sonnet 4.5
0.0073.7064.800.000.0059.00—闭源
201
GLM-4.7-Flashthinking
0.0075.2059.200.000.000.00310B免费商用
202
ERNIE-4.5-VL-424B-A47B-Basethinking
0.0076.800.000.000.0038.804240B免费商用
203
GPT-5
0.0077.800.000.000.000.00—闭源
204
Gemini 2.5 Flash
0.0078.3050.000.0088.0041.10—闭源
205
OpenAI o3-mini (high)
0.0079.7049.3097.9087.0069.50—闭源
206
Grok 3
0.0080.400.000.0084.2070.60—闭源
207
Claude Opus 4.1thinking + 使用工具
0.0080.9074.500.000.0065.00—闭源
1
OpenAI o1
•闭源
MMLU Pro91.04
GPQA Diamond77.30
SWE-bench Verified48.90
MATH-50096.40
AIME 202479.20
LiveCodeBench71.00
2
Gemini 3.0 Pro (Preview 11-2025)thinking
0•闭源
MMLU Pro90.00
GPQA Diamond91.90
SWE-bench Verified76.20
MATH-5000.00
AIME 20240.00
LiveCodeBench92.00
3
Claude Opus 4.5thinking
0•闭源
MMLU Pro90.00
GPQA Diamond87.00
SWE-bench Verified80.90
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
4
Claude Opus 4.1thinking
0•闭源
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified74.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
5
M2.1thinking
2300B•免费商用
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified74.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
6
Claude Sonnet 4.5thinking
0•闭源
MMLU Pro88.00
GPQA Diamond83.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench71.00
7
Hunyuan-T1
0•闭源
MMLU Pro87.20
GPQA Diamond69.30
SWE-bench Verified0.00
MATH-50096.20
AIME 202478.20
LiveCodeBench64.90
8
Grok 4thinking
0•闭源
MMLU Pro87.00
GPQA Diamond87.00
SWE-bench Verified58.60
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
9
GPT-4.5
•闭源
MMLU Pro86.10
GPQA Diamond71.40
SWE-bench Verified38.00
MATH-50090.70
AIME 202436.70
LiveCodeBench46.40
10
Gemini 2.5-Pro
0•闭源
MMLU Pro86.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
11
Qwen3-Max-Thinkingthinking
10000B•闭源
MMLU Pro85.70
GPQA Diamond87.40
SWE-bench Verified75.30
MATH-5000.00
AIME 20240.00
LiveCodeBench85.90
12
OpenAI o3
0•闭源
MMLU Pro85.60
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50098.10
AIME 202491.60
LiveCodeBench75.80
13
DeepSeek V3.2-Expthinking
6710B•免费商用
MMLU Pro85.00
GPQA Diamond79.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
14
Grok 4.1 Fastthinking
0•闭源
MMLU Pro85.00
GPQA Diamond85.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
15
DeepSeek-V3.1 Terminus
6710B•免费商用
MMLU Pro85.00
GPQA Diamond80.70
SWE-bench Verified68.40
MATH-5000.00
AIME 20240.00
LiveCodeBench74.90
16
DeepSeek-V3.1 Terminusthinking
6710B•免费商用
MMLU Pro85.00
GPQA Diamond79.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
17
DeepSeek-V3.1thinking
6710B•免费商用
MMLU Pro85.00
GPQA Diamond80.10
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.10
LiveCodeBench74.80
18
DeepSeek-R1-0528thinking
6710B•免费商用
MMLU Pro85.00
GPQA Diamond81.00
SWE-bench Verified57.60
MATH-50098.00
AIME 202491.40
LiveCodeBench73.30
19
Claude Opus 4
•闭源
MMLU Pro85.00
GPQA Diamond79.60
SWE-bench Verified72.50
MATH-50098.20
AIME 202476.00
LiveCodeBench56.60
20
GLM-4.5thinking
3550B•免费商用
MMLU Pro84.60
GPQA Diamond79.10
SWE-bench Verified64.20
MATH-50098.20
AIME 202491.00
LiveCodeBench72.90
21
Kimi K2 Thinkingthinking
10400B•免费商用
MMLU Pro84.60
GPQA Diamond84.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.10
22
Qwen3-235B-A22B-Thinking-2507thinking
2350B•免费商用
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
23
Qwen3-235B-A22B-Thinkingthinking
305B•免费商用
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
24
GLM-4.7thinking
3580B•免费商用
MMLU Pro84.30
GPQA Diamond85.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench84.90
25
DeepSeek-R1
6710B•免费商用
MMLU Pro84.00
GPQA Diamond71.50
SWE-bench Verified49.20
MATH-50097.30
AIME 202479.80
LiveCodeBench65.90
26
Claude Sonnet 4thinking
0•闭源
MMLU Pro84.00
GPQA Diamond75.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench66.00
27
Qwen3 Max (Preview)
0•闭源
MMLU Pro84.00
GPQA Diamond76.00
SWE-bench Verified69.60
MATH-5000.00
AIME 20240.00
LiveCodeBench57.50
28
DeepSeek V3.2-Exp
6710B•免费商用
MMLU Pro84.00
GPQA Diamond74.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.00
29
DeepSeek-V3.1
6710B•免费商用
MMLU Pro83.70
GPQA Diamond74.90
SWE-bench Verified66.00
MATH-5000.00
AIME 202466.30
LiveCodeBench56.40
30
Intern-S1
2410B•免费商用
MMLU Pro83.50
GPQA Diamond77.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
31
Qwen3-235B-A22B-2507
2350B•免费商用
MMLU Pro83.00
GPQA Diamond77.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.80
32
GLM-4.6thinking
3550B•免费商用
MMLU Pro83.00
GPQA Diamond81.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench82.80
33
Pangu Pro MoE
719B•免费商用
MMLU Pro82.60
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50096.80
AIME 202479.20
LiveCodeBench59.60
34
Llama 4 Behemoth Instruct
20000B•免费商用
MMLU Pro82.20
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50095.00
AIME 20240.00
LiveCodeBench49.40
35
MiniMax M2thinking
2300B•免费商用
MMLU Pro82.00
GPQA Diamond78.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.00
36
GLM-4.5-Airthinking
1060B•免费商用
MMLU Pro81.40
GPQA Diamond75.00
SWE-bench Verified57.60
MATH-50098.10
AIME 202489.40
LiveCodeBench70.70
37
DeepSeek-V3-0324
6710B•免费商用
MMLU Pro81.20
GPQA Diamond68.40
SWE-bench Verified38.80
MATH-50094.00
AIME 202459.40
LiveCodeBench49.20
38
MiniMax-M1-80k
4560B•免费商用
MMLU Pro81.10
GPQA Diamond70.00
SWE-bench Verified56.00
MATH-50096.80
AIME 202486.00
LiveCodeBench65.00
39
Kimi K2
10000B•免费商用
MMLU Pro81.10
GPQA Diamond75.10
SWE-bench Verified51.80
MATH-50097.40
AIME 202469.60
LiveCodeBench53.70
40
OpenAI o4 - minithinking
•闭源
MMLU Pro80.60
GPQA Diamond81.40
SWE-bench Verified68.10
MATH-5000.00
AIME 202493.40
LiveCodeBench0.00
41
MiniMax-M1-40k
4560B•免费商用
MMLU Pro80.60
GPQA Diamond69.20
SWE-bench Verified55.60
MATH-50096.00
AIME 202483.30
LiveCodeBench62.30
42
GPT-4.1
•闭源
MMLU Pro80.50
GPQA Diamond66.30
SWE-bench Verified54.60
MATH-50092.80
AIME 202448.10
LiveCodeBench40.50
43
Llama 4 Maverick Instruct
4000B•免费商用
MMLU Pro80.50
GPQA Diamond69.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.40
44
OpenAI o1-mini
•闭源
MMLU Pro80.30
GPQA Diamond60.00
SWE-bench Verified0.00
MATH-50090.00
AIME 202463.60
LiveCodeBench52.00
45
Haiku 4.5
0•闭源
MMLU Pro80.00
GPQA Diamond60.50
SWE-bench Verified60.60
MATH-5000.00
AIME 20240.00
LiveCodeBench51.00
46
GPT-4o(2025-03-27)
0•闭源
MMLU Pro79.80
GPQA Diamond66.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.80
47
Gemini 2.0 Pro Experimental
•闭源
MMLU Pro79.10
GPQA Diamond64.70
SWE-bench Verified0.00
MATH-5000.00
AIME 202436.00
LiveCodeBench0.00
48
Hunyuan-TurboS
•闭源
MMLU Pro79.00
GPQA Diamond57.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.00
49
Pangu Embedded
70B•免费商用
MMLU Pro79.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50092.40
AIME 202481.90
LiveCodeBench67.10
50
GPT OSS 120Bthinking
117B•免费商用
MMLU Pro79.00
GPQA Diamond80.10
SWE-bench Verified60.10
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
51
Kimi K2.5thinking
10000B•免费商用
MMLU Pro78.50
GPQA Diamond87.60
SWE-bench Verified76.80
MATH-5000.00
AIME 20240.00
LiveCodeBench85.00
52
ERNIE-4.5-300B-A47B
3000B•免费商用
MMLU Pro78.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.40
AIME 202454.80
LiveCodeBench38.80
53
Qwen3-30B-A3B-2507
305B•免费商用
MMLU Pro78.40
GPQA Diamond70.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.20
54
Claude 3.5 Sonnet New
0•闭源
MMLU Pro78.00
GPQA Diamond65.00
SWE-bench Verified49.00
MATH-50078.00
AIME 202416.00
LiveCodeBench38.70
55
GLM-4.6
3550B•免费商用
MMLU Pro78.00
GPQA Diamond63.00
SWE-bench Verified68.00
MATH-5000.00
AIME 20240.00
LiveCodeBench56.00
56
GPT-5-minithinking
0•闭源
MMLU Pro78.00
GPQA Diamond69.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.00
57
GPT-4o
0•闭源
MMLU Pro77.90
GPQA Diamond70.10
SWE-bench Verified31.00
MATH-50075.90
AIME 20249.30
LiveCodeBench35.10
58
GPT-4o(2024-11-20)
•闭源
MMLU Pro77.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
59
Claude 3.5 Sonnet
•闭源
MMLU Pro77.64
GPQA Diamond59.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
60
Gemini 2.0 Flash Experimental
•闭源
MMLU Pro76.24
GPQA Diamond65.20
SWE-bench Verified21.40
MATH-5000.00
AIME 20240.00
LiveCodeBench29.10
61
Gemini 1.5 Pro
0•闭源
MMLU Pro76.10
GPQA Diamond53.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
62
Qwen2.5-Max
•闭源
MMLU Pro76.10
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
63
QwQ-32B
325B•免费商用
MMLU Pro76.00
GPQA Diamond58.00
SWE-bench Verified0.00
MATH-50091.00
AIME 202479.50
LiveCodeBench0.00
64
Haiku 4.5thinking
0•闭源
MMLU Pro76.00
GPQA Diamond73.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench62.00
65
DeepSeek-V3
6810B•免费商用
MMLU Pro75.90
GPQA Diamond59.10
SWE-bench Verified0.00
MATH-50087.80
AIME 202439.00
LiveCodeBench34.60
66
Grok 2
2690B•免费商用
MMLU Pro75.50
GPQA Diamond56.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
67
Llama 4 Scout Instruct
1090B•免费商用
MMLU Pro74.30
GPQA Diamond57.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.80
68
GPT OSS 20Bthinking
210B•免费商用
MMLU Pro74.00
GPQA Diamond71.50
SWE-bench Verified34.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
69
Llama3.1-405B Instruct
4050B•免费商用
MMLU Pro73.40
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench30.20
70
Qwen3-235B-A22B
2350B•免费商用
MMLU Pro72.90
GPQA Diamond71.10
SWE-bench Verified34.40
MATH-50096.20
AIME 202485.70
LiveCodeBench70.70
71
Qwen3-8B
80B•免费商用
MMLU Pro72.50
GPQA Diamond39.30
SWE-bench Verified0.00
MATH-50087.40
AIME 202479.40
LiveCodeBench61.80
72
GLM-4-9B-Chat
90B•免费商用
MMLU Pro72.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202476.40
LiveCodeBench51.80
73
Gemini 2.0 Flash-Lite
•闭源
MMLU Pro71.60
GPQA Diamond51.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench28.90
74
QwQ-32B-Preview
320B•免费商用
MMLU Pro70.97
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50090.60
AIME 202450.00
LiveCodeBench0.00
75
Phi 4 - 14B
140B•不可商用
MMLU Pro70.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
76
Qwen2.5-32B
320B•免费商用
MMLU Pro69.23
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.20
77
Qwen3-30B-A3B
305B•免费商用
MMLU Pro69.10
GPQA Diamond54.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench29.00
78
Mistral-Small-3.2
240B•免费商用
MMLU Pro69.06
GPQA Diamond46.13
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
79
Llama3.3-70B-Instruct
700B•免费商用
MMLU Pro68.90
GPQA Diamond50.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
80
Claude3-Opus
0•闭源
MMLU Pro68.45
GPQA Diamond50.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
81
Gemma 3 - 27B (IT)
270B•免费商用
MMLU Pro67.50
GPQA Diamond42.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202425.30
LiveCodeBench29.70
82
Hunyuan-A13B-Instruct
800B•免费商用
MMLU Pro67.23
GPQA Diamond71.20
SWE-bench Verified0.00
MATH-5000.00
AIME 202487.30
LiveCodeBench63.90
83
Mistral-Small-3.1-24B-Instruct-2503
240B•免费商用
MMLU Pro66.76
GPQA Diamond45.96
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
84
Llama3.1-70B-Instruct
700B•免费商用
MMLU Pro66.40
GPQA Diamond48.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
85
Qwen3-Next
800B•免费商用
MMLU Pro66.05
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench56.60
86
Claude 3.5 Haiku
0•闭源
MMLU Pro65.00
GPQA Diamond41.60
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
87
Qwen2.5-14B
140B•免费商用
MMLU Pro63.69
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
88
Llama 4 Maverick
4000B•免费商用
MMLU Pro62.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
89
GPT-4o mini
0•闭源
MMLU Pro61.70
GPQA Diamond41.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
90
Llama3.1-405B
4050B•免费商用
MMLU Pro61.60
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
91
Gemma 3 - 12B (IT)
120B•免费商用
MMLU Pro60.60
GPQA Diamond40.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench24.60
92
Llama 4 Scout
1090B•免费商用
MMLU Pro58.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
93
Qwen2.5-72B
727B•免费商用
MMLU Pro58.10
GPQA Diamond45.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
94
Claude3-Sonnet
0•闭源
MMLU Pro56.80
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
95
Gemma2-27B
270B•免费商用
MMLU Pro56.54
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
96
Mixtral-8x22B-Instruct-v0.1
1410B•免费商用
MMLU Pro56.33
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
97
Llama3-70B-Instruct
700B•免费商用
MMLU Pro56.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
98
Phi-4-mini-instruct (3.8B)
38B•免费商用
MMLU Pro52.80
GPQA Diamond36.00
SWE-bench Verified0.00
MATH-50071.80
AIME 202410.00
LiveCodeBench0.00
99
Llama3-70B
700B•免费商用
MMLU Pro52.78
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
100
Llama3.1-70B
700B•免费商用
MMLU Pro52.47
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
101
Grok-1.5
•闭源
MMLU Pro51.00
GPQA Diamond35.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
102
C4AI Aya Vision 32B
320B•不可商用
MMLU Pro47.16
GPQA Diamond33.84
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
103
Qwen2.5-7B
70B•免费商用
MMLU Pro45.00
GPQA Diamond36.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
104
Gemma 2 - 9B
90B•免费商用
MMLU Pro44.70
GPQA Diamond32.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
105
Llama3.1-8B-Instruct
80B•免费商用
MMLU Pro44.00
GPQA Diamond26.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
106
Moonlight-16B-A3B-Instruct
160B•免费商用
MMLU Pro42.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
107
Llama3.1-8B
80B•免费商用
MMLU Pro35.40
GPQA Diamond25.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
108
Qwen2.5-3B
30B•免费商用
MMLU Pro34.60
GPQA Diamond24.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
109
Mistral-7B-Instruct-v0.3
70B•免费商用
MMLU Pro30.90
GPQA Diamond24.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
110
Llama-3.2-3B
32B•免费商用
MMLU Pro25.00
GPQA Diamond26.60
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
111
o3-prohigh
•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified75.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
112
GPT-5.1 Codexhigh + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.40
MATH-5000.00
AIME 20240.00
LiveCodeBench85.50
113
GPT-5 Codexhigh
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified74.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
114
GLM-4.7thinking + 使用工具
3580B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
115
Grok 4 Heavyparallel_thinking + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
116
Haiku 4.5thinking + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
117
DeepSeek V3.2thinking + 使用工具
6710B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.10
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
118
Claude Sonnet 4thinking + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified72.70
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
119
Grok 4 Code
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified72.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
120
Kimi K2 Thinkingthinking + 使用工具
10400B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified71.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
121
Grok Code Fast 1thinking
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
122
Hunyuan-7B
70B•免费商用
MMLU Pro0.00
GPQA Diamond60.10
SWE-bench Verified0.00
MATH-50093.70
AIME 202481.10
LiveCodeBench57.00
123
GPT-5.1-Codex-Maxhigh + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified76.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
124
Claude Sonnet 4.5thinking + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified77.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
125
Claude Opus 4.1parallel_thinking + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified79.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
126
Claude Sonnet 4parallel_thinking + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified80.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
127
Claude Sonnet 4.5parallel_thinking + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified82.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
128
GPT-5-mini
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
129
Grok 3.5
•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
130
Phi-4-instruct (reasoning-trained)
38B•闭源
MMLU Pro0.00
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-50090.40
AIME 202450.00
LiveCodeBench0.00
131
DeepSeek-R1-Distill-Qwen-7B
70B•免费商用
MMLU Pro0.00
GPQA Diamond49.50
SWE-bench Verified0.00
MATH-50091.40
AIME 202453.30
LiveCodeBench0.00
132
GPT-4.1 nano
•闭源
MMLU Pro0.00
GPQA Diamond50.30
SWE-bench Verified0.00
MATH-5000.00
AIME 202429.40
LiveCodeBench0.00
133
Qwen3-32B
320B•免费商用
MMLU Pro0.00
GPQA Diamond53.30
SWE-bench Verified0.00
MATH-5000.00
AIME 202481.40
LiveCodeBench65.70
134
Codestral
220B•不可商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench31.50
135
Kimi k1.5 (Short-CoT)
•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50094.60
AIME 20240.00
LiveCodeBench0.00
136
Codestral 25.01
•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench37.90
137
QwQ-Max-Preview
•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.60
138
Kimi-k1.6-IOI
•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.90
139
OpenAI o3-mini (medium)
•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.40
140
Kimi-k1.6-IOI-high
•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench73.80
141
Gemini 2.5 Pro Deep Think
•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.40
142
Claude Opus 4.5thinking + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench87.00
143
Gemini 2.5 Deep Thinkdeeper_thinking
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench87.60
144
GPT OSS 20Bthinking + 使用工具
210B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
145
GPT OSS 120Bthinking + 使用工具
117B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.60
LiveCodeBench0.00
146
OpenAI o4 - minithinking + 使用工具
•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202498.70
LiveCodeBench0.00
147
MiniMax M2thinking + 使用工具
2300B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
148
Kimi k1.5 (Long-CoT)
•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.20
AIME 20240.00
LiveCodeBench0.00
149
Qwen3-30B-A3B-2507thinking
305B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified22.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
150
Devstral Small 1.0
240B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified46.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
151
Qwen3-Coder-Flash
305B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified51.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
152
Devstral Small 1.1
240B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified53.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
153
Gemini 2.5 Flash-Preview-09-2025thinking
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified54.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
154
Devstral Medium
0•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified61.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
155
Qwen3-Coder-480B-A35B
4800B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified67.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
156
DeepSeek V3.2-Expthinking + 使用工具
6710B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified67.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
157
Kimi K2 0905thinking + 使用工具
10000B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
158
Kimi K2 0905
10000B•免费商用
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
159
Gemini 2.5-Prothinking
0•闭源
MMLU Pro0.00
GPQA Diamond86.40
SWE-bench Verified67.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
160
Gemini 2.5 Flashthinking
•闭源
MMLU Pro0.00
GPQA Diamond82.80
SWE-bench Verified48.90
MATH-5000.00
AIME 20240.00
LiveCodeBench55.40
161
GLM-4.6thinking + 使用工具
3550B•免费商用
MMLU Pro0.00
GPQA Diamond82.90
SWE-bench Verified68.00
MATH-5000.00
AIME 20240.00
LiveCodeBench84.50
162
Gemini-2.5-Pro-Preview-05-06
•闭源
MMLU Pro0.00
GPQA Diamond83.00
SWE-bench Verified63.20
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
163
OpenAI o3thinking
0•闭源
MMLU Pro0.00
GPQA Diamond83.30
SWE-bench Verified69.10
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
164
Claude Sonnet 4deeper_thinking + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond83.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
165
o3-pro
•闭源
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.00
LiveCodeBench0.00
166
Gemini 2.5 Pro Experimental 03-25
0•闭源
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified63.80
MATH-5000.00
AIME 202492.00
LiveCodeBench70.40
167
Grok-3 mini - Reasoning
•闭源
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
168
Grok-3 - Reasoning Beta
•闭源
MMLU Pro0.00
GPQA Diamond84.60
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.30
LiveCodeBench79.40
169
Claude Sonnet 3.7-64K Extended Thinking
•闭源
MMLU Pro0.00
GPQA Diamond84.80
SWE-bench Verified0.00
MATH-50096.20
AIME 202480.00
LiveCodeBench0.00
170
Grok 4 Fastthinking
0•闭源
MMLU Pro0.00
GPQA Diamond85.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
171
GPT-5high
0•闭源
MMLU Pro0.00
GPQA Diamond85.70
SWE-bench Verified72.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
172
DeepSeek V3.2thinking
6710B•免费商用
MMLU Pro0.00
GPQA Diamond82.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.30
173
GPT-5thinking + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond87.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
174
GPT-5.1high
0•闭源
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified76.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
175
GPT-5.1thinking
0•闭源
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
176
GPT-5-Prothinking
0•闭源
MMLU Pro0.00
GPQA Diamond88.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
177
Grok 4 Heavyparallel_thinking
0•闭源
MMLU Pro0.00
GPQA Diamond88.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
178
GPT-5-Prothinking + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond89.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
179
Gemini 3.0 Flashthinking
0•闭源
MMLU Pro0.00
GPQA Diamond90.40
SWE-bench Verified68.70
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
180
GPT-5.2thinking
0•闭源
MMLU Pro0.00
GPQA Diamond92.40
SWE-bench Verified80.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
181
GPT-5.2 Prothinking
0•闭源
MMLU Pro0.00
GPQA Diamond93.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
182
Gemini 3.0 Pro (Preview 11-2025)parallel_thinking
0•闭源
MMLU Pro0.00
GPQA Diamond93.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
183
Amazon Nova Pro
•闭源
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
184
OpenAI o3-minithinking
0•闭源
MMLU Pro0.00
GPQA Diamond70.60
SWE-bench Verified40.80
MATH-50095.80
AIME 202460.00
LiveCodeBench0.00
185
Qwen3-8Bthinking
80B•免费商用
MMLU Pro0.00
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-50097.40
AIME 202476.00
LiveCodeBench57.50
186
GPT-4.1 mini
•闭源
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified23.60
MATH-5000.00
AIME 202449.60
LiveCodeBench0.00
187
Grok 3 mini
•闭源
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202440.00
LiveCodeBench0.00
188
DeepSeek-R1-Distill-Llama-70B
700B•免费商用
MMLU Pro0.00
GPQA Diamond65.20
SWE-bench Verified0.00
MATH-50094.50
AIME 20240.00
LiveCodeBench0.00
189
Qwen3-4B-Thinking-2507thinking
40B•免费商用
MMLU Pro0.00
GPQA Diamond65.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.20
190
GLM-4.7-Flash
310B•免费商用
MMLU Pro0.00
GPQA Diamond66.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
191
Gemini 2.5 Flash-Lite
•闭源
MMLU Pro0.00
GPQA Diamond66.70
SWE-bench Verified27.60
MATH-5000.00
AIME 20240.00
LiveCodeBench34.30
192
Claude Sonnet 4
0•闭源
MMLU Pro0.00
GPQA Diamond68.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202443.40
LiveCodeBench48.50
193
Claude Sonnet 3.7
•闭源
MMLU Pro0.00
GPQA Diamond68.00
SWE-bench Verified70.30
MATH-50082.20
AIME 202423.30
LiveCodeBench0.00
194
Magistral-Small-2506
240B•免费商用
MMLU Pro0.00
GPQA Diamond68.18
SWE-bench Verified0.00
MATH-5000.00
AIME 202470.68
LiveCodeBench55.84
195
Qwen3-32Bthinking
320B•免费商用
MMLU Pro0.00
GPQA Diamond68.40
SWE-bench Verified0.00
MATH-50097.20
AIME 202481.40
LiveCodeBench0.00
196
Qwen3-4B-2507
40B•免费商用
MMLU Pro0.00
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.10
197
Magistral-Medium-2506
•闭源
MMLU Pro0.00
GPQA Diamond70.83
SWE-bench Verified0.00
MATH-5000.00
AIME 202473.59
LiveCodeBench59.36
198
Qwen3-235B-A22Bthinking
2350B•免费商用
MMLU Pro0.00
GPQA Diamond71.10
SWE-bench Verified0.00
MATH-50098.00
AIME 202485.70
LiveCodeBench70.70
199
Step3
3210B•免费商用
MMLU Pro0.00
GPQA Diamond73.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.10
200
Claude Sonnet 4.5
0•闭源
MMLU Pro0.00
GPQA Diamond73.70
SWE-bench Verified64.80
MATH-5000.00
AIME 20240.00
LiveCodeBench59.00
201
GLM-4.7-Flashthinking
310B•免费商用
MMLU Pro0.00
GPQA Diamond75.20
SWE-bench Verified59.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
202
ERNIE-4.5-VL-424B-A47B-Basethinking
4240B•免费商用
MMLU Pro0.00
GPQA Diamond76.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench38.80
203
GPT-5
0•闭源
MMLU Pro0.00
GPQA Diamond77.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
204
Gemini 2.5 Flash
•闭源
MMLU Pro0.00
GPQA Diamond78.30
SWE-bench Verified50.00
MATH-5000.00
AIME 202488.00
LiveCodeBench41.10
205
OpenAI o3-mini (high)
•闭源
MMLU Pro0.00
GPQA Diamond79.70
SWE-bench Verified49.30
MATH-50097.90
AIME 202487.00
LiveCodeBench69.50
206
Grok 3
•闭源
MMLU Pro0.00
GPQA Diamond80.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202484.20
LiveCodeBench70.60
207
Claude Opus 4.1thinking + 使用工具
0•闭源
MMLU Pro0.00
GPQA Diamond80.90
SWE-bench Verified74.50
MATH-5000.00
AIME 20240.00
LiveCodeBench65.00