DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tools

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款

LLM Benchmark Performance Comparison

Compare model performance across MMLU Pro, HLE, SWE-Bench and more. Select benchmarks to view rankings.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

More benchmark coverage

Browse the benchmark catalog by category and language

More Benchmarks

Filters

All3B and below7B13B34B65B100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI
RankModelMMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBenchParams (B)License
1OpenAI o191.0477.3048.9096.4079.2071.00—不开源
2Gemini 3.0 Pro (Preview 11-2025)90.0091.9076.200.000.0092.00—不开源
3Claude Opus 4.590.0087.0080.900.000.000.00—不开源
4Claude Opus 4.188.0081.0074.500.000.000.00—不开源
5M2.188.0081.0074.800.000.000.002300BFree commercial
6Claude Sonnet 4.588.0083.400.000.000.0071.00—不开源
7Qwen3.5-397B-A17B87.8088.4076.400.000.000.00397BFree commercial
8Qwen3.5-397B-A17B87.8088.400.000.000.0083.60397BFree commercial
9Hunyuan-T187.2069.300.0096.2078.2064.90—不开源
10Grok 487.0087.0058.600.000.0082.00—不开源
11GPT-4.586.1071.4038.0090.7036.7046.40—不开源
12Qwen3.5-27B86.1085.5072.400.000.000.00270BFree commercial
13Gemini 2.5-Pro86.000.000.0098.8092.0077.10—不开源
14Qwen3-Max-Thinking85.7087.4075.300.000.0085.9010000B不开源
15OpenAI o385.600.000.0098.1091.6075.80—不开源
16DeepSeek-R1-052885.0081.0057.6098.0091.4073.306710BFree commercial
17Grok 4.1 Fast85.0085.000.000.000.0082.00—不开源
18DeepSeek V3.2-Exp85.0079.900.000.000.0074.106710BFree commercial
19DeepSeek-V3.1 Terminus85.0080.7068.400.000.0074.906710BFree commercial
20DeepSeek-V3.1 Terminus85.0079.000.000.000.0080.006710BFree commercial
21DeepSeek-V3.185.0080.100.000.0093.1074.806710BFree commercial
22Claude Opus 485.0079.6072.5098.2076.0056.60—不开源
23GLM-4.584.6079.1064.2098.2091.0072.903550BFree commercial
24Kimi K2 Thinking84.6084.500.000.000.0083.1010400BFree commercial
25Qwen3-235B-A22B-Thinking84.4081.100.000.000.0074.10305BFree commercial
26Qwen3-235B-A22B-Thinking-250784.4081.100.000.000.0074.102350BFree commercial
27GLM-4.784.3085.700.000.000.0084.903580BFree commercial
28DeepSeek-R184.0071.5049.2097.3079.8065.906710BFree commercial
29Claude Sonnet 484.0075.400.000.000.0066.00—不开源
30Qwen3 Max (Preview)84.0076.0069.600.000.0057.50—不开源
31DeepSeek V3.2-Exp84.0074.000.000.000.0055.006710BFree commercial
32DeepSeek-V3.183.7074.9066.000.0066.3056.406710BFree commercial
33Intern-S183.5077.300.000.000.000.002410BFree commercial
34Qwen3-235B-A22B-250783.0077.500.000.000.0051.802350BFree commercial
35GLM-4.683.0081.000.000.000.0082.803550BFree commercial
36Pangu Pro MoE82.6073.700.0096.8079.2059.60719BFree commercial
37Llama 4 Behemoth Instruct82.2073.700.0095.000.0049.4020000BFree commercial
38MiniMax M282.0078.000.000.000.0083.002300BFree commercial
39GLM-4.5-Air81.4075.0057.6098.1089.4070.701060BFree commercial
40DeepSeek-V3-032481.2068.4038.8094.0059.4049.206710BFree commercial
41MiniMax-M1-80k81.1070.0056.0096.8086.0065.004560BFree commercial
42Kimi K281.1075.1051.8097.4069.6053.7010000BFree commercial
43OpenAI o4 - mini80.6081.4068.100.0093.400.00—不开源
44MiniMax-M1-40k80.6069.2055.6096.0083.3062.304560BFree commercial
45GPT-4.180.5066.3054.6092.8048.1040.50—不开源
46Llama 4 Maverick Instruct80.5069.800.000.000.0043.404000BFree commercial
47OpenAI o1-mini80.3060.000.0090.0063.6052.00—不开源
48Haiku 4.580.0073.300.000.000.0062.00—不开源
49GPT-4o(2025-03-27)79.8066.900.000.000.0035.80—不开源
50Gemini 2.0 Pro Experimental79.1064.700.000.0036.000.00—不开源
51Hunyuan-TurboS79.0057.500.000.000.0032.00—不开源
52Pangu Embedded79.000.000.0092.4081.9067.1070BFree commercial
53GPT OSS 120B79.0080.1060.100.000.000.00117BFree commercial
54Kimi K2.578.5087.6076.800.000.0085.0010000BFree commercial
55ERNIE-4.5-300B-A47B78.400.000.0096.4054.8038.803000BFree commercial
56Qwen3-30B-A3B-250778.4070.400.000.000.0043.20305BFree commercial
57Claude 3.5 Sonnet New78.0065.0049.0078.0016.0038.70—不开源
58GLM-4.678.0063.0068.000.000.0056.003550BFree commercial
59GPT-5-mini78.0069.000.000.000.0055.00—不开源
60GPT-4o77.9070.1031.0075.909.3035.10—不开源
61GPT-4o(2024-11-20)77.900.000.000.000.000.00—不开源
62Claude 3.5 Sonnet77.6459.400.000.000.000.00—不开源
63Gemini 2.0 Flash Experimental76.2465.2021.400.000.0029.10—不开源
64Gemini 1.5 Pro76.1053.500.000.000.000.00—不开源
65Qwen2.5-Max76.100.000.000.000.000.00—不开源
66Haiku 4.576.0060.500.000.000.0051.00—不开源
67QwQ-32B76.0058.000.0091.0079.500.00325BFree commercial
68DeepSeek-V375.9059.100.0087.8039.0034.606810BFree commercial
69Grok 275.5056.000.000.000.000.002690BFree commercial
70Llama 4 Scout Instruct74.3057.200.000.000.0032.801090BFree commercial
71GPT OSS 20B74.0071.5034.000.000.000.00210BFree commercial
72Llama3.1-405B Instruct73.4049.000.000.000.0030.204050BFree commercial
73Qwen3-235B-A22B72.9071.1034.4096.2085.7070.702350BFree commercial
74Qwen3-8B72.5039.300.0087.4079.4061.8080BFree commercial
75GLM-4-9B-Chat72.400.000.000.0076.4051.8090BFree commercial
76Gemini 2.0 Flash-Lite71.6051.500.000.000.0028.90—不开源
77QwQ-32B-Preview70.970.000.0090.6050.000.00320BFree commercial
78Phi 4 - 14B70.400.000.000.000.000.00140BNon-commercial
79Qwen2.5-32B69.230.000.000.000.0051.20320BFree commercial
80Qwen3-30B-A3B69.1054.800.000.000.0029.00305BFree commercial
81Mistral-Small-3.269.0646.130.000.000.000.00240BFree commercial
82Llama3.3-70B-Instruct68.9050.500.000.000.0033.30700BFree commercial
83Claude3-Opus68.4550.400.000.000.000.00—不开源
84Gemma 3 - 27B (IT)67.5042.400.000.0025.3029.70270BFree commercial
85Hunyuan-A13B-Instruct67.2371.200.000.0087.3063.90800BFree commercial
86Mistral-Small-3.1-24B-Instruct-250366.7645.960.000.000.000.00240BFree commercial
87Llama3.1-70B-Instruct66.4048.000.000.000.0033.30700BFree commercial
88Qwen3-Next66.050.000.000.000.0056.60800BFree commercial
89Claude 3.5 Haiku65.0041.600.000.000.000.00—不开源
90Qwen2.5-14B63.690.000.000.000.000.00140BFree commercial
91Llama 4 Maverick62.900.000.000.000.000.004000BFree commercial
92GPT-4o mini61.7041.100.000.000.000.00—不开源
93Llama3.1-405B61.600.000.000.000.000.004050BFree commercial
94Gemma 3 - 12B (IT)60.6040.900.000.000.0024.60120BFree commercial
95Llama 4 Scout58.200.000.000.000.000.001090BFree commercial
96Qwen2.5-72B58.1045.900.000.000.000.00727BFree commercial
97Claude3-Sonnet56.800.000.000.000.000.00—不开源
98Gemma2-27B56.540.000.000.000.000.00270BFree commercial
99Mixtral-8x22B-Instruct-v0.156.330.000.000.000.000.001410BFree commercial
100Llama3-70B-Instruct56.200.000.000.000.000.00700BFree commercial
101Phi-4-mini-instruct (3.8B)52.8036.000.0071.8010.000.0038BFree commercial
102Llama3-70B52.780.000.000.000.000.00700BFree commercial
103Llama3.1-70B52.470.000.000.000.000.00700BFree commercial
104Grok-1.551.0035.900.000.000.000.00—不开源
105C4AI Aya Vision 32B47.1633.840.000.000.000.00320BNon-commercial
106Qwen2.5-7B45.0036.400.000.000.000.0070BFree commercial
107Gemma 2 - 9B44.7032.800.000.000.000.0090BFree commercial
108Llama3.1-8B-Instruct44.0026.300.000.000.000.0080BFree commercial
109Moonlight-16B-A3B-Instruct42.400.000.000.000.000.00160BFree commercial
110Llama3.1-8B35.4025.800.000.000.000.0080BFree commercial
111Qwen2.5-3B34.6024.300.000.000.000.0030BFree commercial
112Mistral-7B-Instruct-v0.330.9024.700.000.000.000.0070BFree commercial
113Llama-3.2-3B25.0026.600.000.000.000.0032BFree commercial
114GPT-5.1-Codex-Max0.000.0076.800.000.000.00—不开源
115Qwen3.5-397B-A17B0.000.0076.400.000.000.00397BFree commercial
116GPT-5.10.000.0076.300.000.000.00—不开源
117o3-pro0.000.0075.000.000.000.00—不开源
118GPT-5 Codex0.000.0074.500.000.000.00—不开源
119Step 3.5 Flash0.000.0074.400.000.0086.401960BFree commercial
120GLM-4.70.000.0073.800.000.000.003580BFree commercial
121Grok 4 Heavy0.000.0073.500.000.000.00—不开源
122Haiku 4.50.000.0073.300.000.000.00—不开源
123DeepSeek V3.20.000.0073.100.000.000.006710BFree commercial
124Claude Sonnet 40.000.0072.700.000.000.00—不开源
125Grok 4 Code0.000.0072.000.000.000.00—不开源
126Kimi K2 Thinking0.000.0071.300.000.000.0010400BFree commercial
127Hunyuan-7B0.0060.100.0093.7081.1057.0070BFree commercial
128Grok Code Fast 10.000.0070.800.000.000.00—不开源
129Claude Sonnet 4.50.000.0077.200.000.000.00—不开源
130Claude Opus 4.10.000.0079.400.000.000.00—不开源
131GPT-5.20.000.0080.000.000.000.00—不开源
132MiniMax M2.50.000.0080.200.000.000.002290BFree commercial
133Claude Sonnet 40.000.0080.200.000.000.00—不开源
134Gemini 3.1 Pro Preview0.000.0080.600.000.002887.00—不开源
135Claude Opus 4.60.000.0080.840.000.000.00—不开源
136Claude Sonnet 50.000.0082.000.000.000.00—不开源
137Claude Sonnet 4.50.000.0082.000.000.000.00—不开源
138GPT-5-mini0.000.000.000.000.000.00—不开源
139Grok 3.50.000.000.000.000.000.00—不开源
140Phi-4-instruct (reasoning-trained)0.0049.000.0090.4050.000.0038B不开源
141DeepSeek-R1-Distill-Qwen-7B0.0049.500.0091.4053.300.0070BFree commercial
142GPT-4.1 nano0.0050.300.000.0029.400.00—不开源
143Qwen3-32B0.0053.300.000.0081.4065.70320BFree commercial
144Qwen3-30B-A3B-25070.000.0022.000.000.000.00305BFree commercial
145Codestral0.000.000.000.000.0031.50220BNon-commercial
146Codestral 25.010.000.000.000.000.0037.90—不开源
147QwQ-Max-Preview0.000.000.000.000.0065.60—Free commercial
148Kimi-k1.6-IOI0.000.000.000.000.0065.90—不开源
149OpenAI o3-mini (medium)0.000.000.000.000.0067.40—不开源
150Kimi-k1.6-IOI-high0.000.000.000.000.0073.80—不开源
151Gemini 2.5 Pro Deep Think0.000.000.000.000.0080.40—不开源
152Qwen3.5-397B-A17B0.000.000.000.000.0083.60397BFree commercial
153Claude Opus 4.50.000.000.000.000.0087.00—不开源
154Gemini 2.5 Deep Think0.000.000.000.000.0087.60—不开源
155GPT OSS 20B0.000.000.000.0096.000.00210BFree commercial
156GPT OSS 120B0.000.000.000.0096.600.00117BFree commercial
157OpenAI o4 - mini0.000.000.000.0098.700.00—不开源
158Kimi k1.5 (Short-CoT)0.000.000.0094.600.000.00—不开源
159Kimi k1.5 (Long-CoT)0.000.000.0096.200.000.00—不开源
160Qwen3-Coder-Next0.000.0070.600.000.000.0080BFree commercial
161Devstral Small 1.00.000.0046.800.000.000.00240BFree commercial
162Qwen3-Coder-Flash0.000.0051.600.000.000.00305BFree commercial
163Devstral Small 1.10.000.0053.600.000.000.00240BFree commercial
164Gemini 2.5 Flash-Preview-09-20250.000.0054.000.000.000.00—不开源
165Haiku 4.50.000.0060.600.000.000.00—不开源
166Devstral Medium0.000.0061.600.000.000.00—不开源
167Claude Sonnet 3.70.000.0062.300.000.000.00—不开源
168Qwen3-Coder-480B-A35B0.000.0067.000.000.000.004800BFree commercial
169DeepSeek V3.2-Exp0.000.0067.800.000.000.006710BFree commercial
170Kimi K2 09050.000.0069.200.000.000.0010000BFree commercial
171Kimi K2 09050.000.0069.200.000.000.0010000BFree commercial
172MiniMax M20.000.0069.400.000.000.002300BFree commercial
173Claude Sonnet 3.70.000.0070.300.000.000.00—不开源
174GPT-5.1 Codex0.000.0070.400.000.0085.50—不开源
175GPT-5-Pro0.0088.400.000.000.000.00—不开源
176o3-pro0.0084.000.000.0093.000.00—不开源
177Gemini 2.5 Pro Experimental 03-250.0084.0063.800.0092.0070.40—不开源
178Grok-3 mini - Reasoning0.0084.000.000.0096.000.00—不开源
179Grok-3 - Reasoning Beta0.0084.600.000.0093.3079.40—不开源
180Claude Sonnet 3.7-64K Extended Thinking0.0084.800.0096.2080.000.00—不开源
181MiniMax M2.50.0085.200.000.000.000.002290BFree commercial
182Grok 4 Fast0.0085.700.000.000.0080.00—不开源
183GPT-50.0085.7072.800.000.000.00—不开源
184GLM-50.0086.0077.800.000.000.007440BFree commercial
185Gemini 2.5-Pro0.0086.4067.200.000.000.00—不开源
186GPT-50.0087.300.000.000.000.00—不开源
187GPT-5.4 mini0.0088.000.000.000.000.00—不开源
188GPT-5.10.0088.100.000.000.000.00—不开源
189GPT-5.10.0088.1076.300.000.000.00—不开源
190GPT-5.10.0088.100.000.000.000.00—不开源
191Claude Sonnet 40.0083.800.000.000.000.00—不开源
192Grok 4 Heavy0.0088.900.000.000.000.00—不开源
193GPT-5-Pro0.0089.400.000.000.000.00—不开源
194Claude Sonnet 4.60.0089.9079.600.000.000.00—不开源
195Gemini 3.0 Flash0.0090.4068.700.000.000.00—不开源
196Gemini 3.0 Pro (Preview 11-2025)0.0091.000.000.000.000.00—不开源
197Claude Opus 4.60.0091.310.0097.600.0076.00—不开源
198GPT-5.20.0092.400.000.000.000.00—不开源
199GPT-5.40.0092.800.000.000.000.00—不开源
200GPT-5.2 Pro0.0093.200.000.000.000.00—不开源
201GPT-5.20.0093.200.000.000.000.00—不开源
202Gemini 3.0 Pro (Preview 11-2025)0.0093.800.000.000.000.00—不开源
203Gemini 3.1 Pro Preview0.0094.300.000.000.000.00—不开源
204GPT-5.4 Pro0.0094.400.000.000.000.00—不开源
205Amazon Nova Pro0.000.000.000.000.000.00—不开源
206Claude Sonnet 4.50.0073.700.000.000.0059.00—不开源
207Qwen3-8B0.0062.000.0097.4076.0057.5080BFree commercial
208GPT-4.1 mini0.0065.0023.600.0049.600.00—不开源
209Grok 3 mini0.0065.000.000.0040.000.00—不开源
210DeepSeek-R1-Distill-Llama-70B0.0065.200.0094.500.000.00700BFree commercial
211Qwen3-4B-Thinking-25070.0065.800.000.000.0055.2040BFree commercial
212GLM-4.7-Flash0.0066.000.000.000.000.00310BFree commercial
213Gemini 2.5 Flash-Lite0.0066.7027.600.000.0034.30—不开源
214Claude Sonnet 40.0068.000.000.0043.4048.50—不开源
215Claude Sonnet 3.70.0068.000.0082.2023.300.00—不开源
216Magistral-Small-25060.0068.180.000.0070.6855.84240BFree commercial
217Qwen3-32B0.0068.400.0097.2081.400.00320BFree commercial
218OpenAI o3-mini0.0070.6040.8095.8060.000.00—不开源
219Magistral-Medium-25060.0070.830.000.0073.5959.36—不开源
220Qwen3-235B-A22B0.0071.100.0098.0085.7070.702350BFree commercial
221Step30.0073.000.000.000.0067.103210BFree commercial
222Qwen3-4B-25070.0062.000.000.000.0035.1040BFree commercial
223GLM-4.7-Flash0.0075.2059.200.000.000.00310BFree commercial
224ERNIE-4.5-VL-424B-A47B-Base0.0076.800.000.000.0038.804240BFree commercial
225Claude Sonnet 3.70.0077.000.000.000.000.00—不开源
226GPT-50.0077.800.000.000.000.00—不开源
227Gemini 2.5 Flash0.0078.3050.000.0088.0041.10—不开源
228OpenAI o3-mini (high)0.0079.7049.3097.9087.0069.50—不开源
229Grok 30.0080.400.000.0084.2070.60—不开源
230Claude Opus 4.10.0080.9074.500.000.0065.00—不开源
231DeepSeek V3.20.0082.4070.200.000.0083.306710BFree commercial
232GPT-5.4 nano0.0082.800.000.000.000.00—不开源
233Gemini 2.5 Flash0.0082.8048.900.000.0055.40—不开源
234GLM-4.60.0082.9068.000.000.0084.503550BFree commercial
235Gemini-2.5-Pro-Preview-05-060.0083.0063.2098.8092.0077.10—不开源
236OpenAI o30.0083.3069.100.000.000.00—不开源
1
OpenAI o1
MMLU Pro91.04
GPQA Diamond77.30
SWE-bench Verified48.90
MATH-50096.40
AIME 202479.20
LiveCodeBench71.00
不开源
2
Gemini 3.0 Pro (Preview 11-2025)
MMLU Pro90.00
GPQA Diamond91.90
SWE-bench Verified76.20
MATH-5000.00
AIME 20240.00
LiveCodeBench92.00
不开源
3
Claude Opus 4.5
MMLU Pro90.00
GPQA Diamond87.00
SWE-bench Verified80.90
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
4
Claude Opus 4.1
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified74.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
5
M2.1
2300B
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified74.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
6
Claude Sonnet 4.5
MMLU Pro88.00
GPQA Diamond83.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench71.00
不开源
7
Qwen3.5-397B-A17B
397B
MMLU Pro87.80
GPQA Diamond88.40
SWE-bench Verified76.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
8
Qwen3.5-397B-A17B
397B
MMLU Pro87.80
GPQA Diamond88.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.60
Free commercial
9
Hunyuan-T1
MMLU Pro87.20
GPQA Diamond69.30
SWE-bench Verified0.00
MATH-50096.20
AIME 202478.20
LiveCodeBench64.90
不开源
10
Grok 4
MMLU Pro87.00
GPQA Diamond87.00
SWE-bench Verified58.60
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
不开源
11
GPT-4.5
MMLU Pro86.10
GPQA Diamond71.40
SWE-bench Verified38.00
MATH-50090.70
AIME 202436.70
LiveCodeBench46.40
不开源
12
Qwen3.5-27B
270B
MMLU Pro86.10
GPQA Diamond85.50
SWE-bench Verified72.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
13
Gemini 2.5-Pro
MMLU Pro86.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
14
Qwen3-Max-Thinking
10000B
MMLU Pro85.70
GPQA Diamond87.40
SWE-bench Verified75.30
MATH-5000.00
AIME 20240.00
LiveCodeBench85.90
不开源
15
OpenAI o3
MMLU Pro85.60
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50098.10
AIME 202491.60
LiveCodeBench75.80
不开源
16
DeepSeek-R1-0528
6710B
MMLU Pro85.00
GPQA Diamond81.00
SWE-bench Verified57.60
MATH-50098.00
AIME 202491.40
LiveCodeBench73.30
Free commercial
17
Grok 4.1 Fast
MMLU Pro85.00
GPQA Diamond85.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
不开源
18
DeepSeek V3.2-Exp
6710B
MMLU Pro85.00
GPQA Diamond79.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
19
DeepSeek-V3.1 Terminus
6710B
MMLU Pro85.00
GPQA Diamond80.70
SWE-bench Verified68.40
MATH-5000.00
AIME 20240.00
LiveCodeBench74.90
Free commercial
20
DeepSeek-V3.1 Terminus
6710B
MMLU Pro85.00
GPQA Diamond79.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
Free commercial
21
DeepSeek-V3.1
6710B
MMLU Pro85.00
GPQA Diamond80.10
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.10
LiveCodeBench74.80
Free commercial
22
Claude Opus 4
MMLU Pro85.00
GPQA Diamond79.60
SWE-bench Verified72.50
MATH-50098.20
AIME 202476.00
LiveCodeBench56.60
不开源
23
GLM-4.5
3550B
MMLU Pro84.60
GPQA Diamond79.10
SWE-bench Verified64.20
MATH-50098.20
AIME 202491.00
LiveCodeBench72.90
Free commercial
24
Kimi K2 Thinking
10400B
MMLU Pro84.60
GPQA Diamond84.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.10
Free commercial
25
Qwen3-235B-A22B-Thinking
305B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
26
Qwen3-235B-A22B-Thinking-2507
2350B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
27
GLM-4.7
3580B
MMLU Pro84.30
GPQA Diamond85.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench84.90
Free commercial
28
DeepSeek-R1
6710B
MMLU Pro84.00
GPQA Diamond71.50
SWE-bench Verified49.20
MATH-50097.30
AIME 202479.80
LiveCodeBench65.90
Free commercial
29
Claude Sonnet 4
MMLU Pro84.00
GPQA Diamond75.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench66.00
不开源
30
Qwen3 Max (Preview)
MMLU Pro84.00
GPQA Diamond76.00
SWE-bench Verified69.60
MATH-5000.00
AIME 20240.00
LiveCodeBench57.50
不开源
31
DeepSeek V3.2-Exp
6710B
MMLU Pro84.00
GPQA Diamond74.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.00
Free commercial
32
DeepSeek-V3.1
6710B
MMLU Pro83.70
GPQA Diamond74.90
SWE-bench Verified66.00
MATH-5000.00
AIME 202466.30
LiveCodeBench56.40
Free commercial
33
Intern-S1
2410B
MMLU Pro83.50
GPQA Diamond77.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
34
Qwen3-235B-A22B-2507
2350B
MMLU Pro83.00
GPQA Diamond77.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.80
Free commercial
35
GLM-4.6
3550B
MMLU Pro83.00
GPQA Diamond81.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench82.80
Free commercial
36
Pangu Pro MoE
719B
MMLU Pro82.60
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50096.80
AIME 202479.20
LiveCodeBench59.60
Free commercial
37
Llama 4 Behemoth Instruct
20000B
MMLU Pro82.20
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50095.00
AIME 20240.00
LiveCodeBench49.40
Free commercial
38
MiniMax M2
2300B
MMLU Pro82.00
GPQA Diamond78.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.00
Free commercial
39
GLM-4.5-Air
1060B
MMLU Pro81.40
GPQA Diamond75.00
SWE-bench Verified57.60
MATH-50098.10
AIME 202489.40
LiveCodeBench70.70
Free commercial
40
DeepSeek-V3-0324
6710B
MMLU Pro81.20
GPQA Diamond68.40
SWE-bench Verified38.80
MATH-50094.00
AIME 202459.40
LiveCodeBench49.20
Free commercial
41
MiniMax-M1-80k
4560B
MMLU Pro81.10
GPQA Diamond70.00
SWE-bench Verified56.00
MATH-50096.80
AIME 202486.00
LiveCodeBench65.00
Free commercial
42
Kimi K2
10000B
MMLU Pro81.10
GPQA Diamond75.10
SWE-bench Verified51.80
MATH-50097.40
AIME 202469.60
LiveCodeBench53.70
Free commercial
43
OpenAI o4 - mini
MMLU Pro80.60
GPQA Diamond81.40
SWE-bench Verified68.10
MATH-5000.00
AIME 202493.40
LiveCodeBench0.00
不开源
44
MiniMax-M1-40k
4560B
MMLU Pro80.60
GPQA Diamond69.20
SWE-bench Verified55.60
MATH-50096.00
AIME 202483.30
LiveCodeBench62.30
Free commercial
45
GPT-4.1
MMLU Pro80.50
GPQA Diamond66.30
SWE-bench Verified54.60
MATH-50092.80
AIME 202448.10
LiveCodeBench40.50
不开源
46
Llama 4 Maverick Instruct
4000B
MMLU Pro80.50
GPQA Diamond69.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.40
Free commercial
47
OpenAI o1-mini
MMLU Pro80.30
GPQA Diamond60.00
SWE-bench Verified0.00
MATH-50090.00
AIME 202463.60
LiveCodeBench52.00
不开源
48
Haiku 4.5
MMLU Pro80.00
GPQA Diamond73.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench62.00
不开源
49
GPT-4o(2025-03-27)
MMLU Pro79.80
GPQA Diamond66.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.80
不开源
50
Gemini 2.0 Pro Experimental
MMLU Pro79.10
GPQA Diamond64.70
SWE-bench Verified0.00
MATH-5000.00
AIME 202436.00
LiveCodeBench0.00
不开源
51
Hunyuan-TurboS
MMLU Pro79.00
GPQA Diamond57.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.00
不开源
52
Pangu Embedded
70B
MMLU Pro79.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50092.40
AIME 202481.90
LiveCodeBench67.10
Free commercial
53
GPT OSS 120B
117B
MMLU Pro79.00
GPQA Diamond80.10
SWE-bench Verified60.10
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
54
Kimi K2.5
10000B
MMLU Pro78.50
GPQA Diamond87.60
SWE-bench Verified76.80
MATH-5000.00
AIME 20240.00
LiveCodeBench85.00
Free commercial
55
ERNIE-4.5-300B-A47B
3000B
MMLU Pro78.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.40
AIME 202454.80
LiveCodeBench38.80
Free commercial
56
Qwen3-30B-A3B-2507
305B
MMLU Pro78.40
GPQA Diamond70.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.20
Free commercial
57
Claude 3.5 Sonnet New
MMLU Pro78.00
GPQA Diamond65.00
SWE-bench Verified49.00
MATH-50078.00
AIME 202416.00
LiveCodeBench38.70
不开源
58
GLM-4.6
3550B
MMLU Pro78.00
GPQA Diamond63.00
SWE-bench Verified68.00
MATH-5000.00
AIME 20240.00
LiveCodeBench56.00
Free commercial
59
GPT-5-mini
MMLU Pro78.00
GPQA Diamond69.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.00
不开源
60
GPT-4o
MMLU Pro77.90
GPQA Diamond70.10
SWE-bench Verified31.00
MATH-50075.90
AIME 20249.30
LiveCodeBench35.10
不开源
61
GPT-4o(2024-11-20)
MMLU Pro77.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
62
Claude 3.5 Sonnet
MMLU Pro77.64
GPQA Diamond59.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
63
Gemini 2.0 Flash Experimental
MMLU Pro76.24
GPQA Diamond65.20
SWE-bench Verified21.40
MATH-5000.00
AIME 20240.00
LiveCodeBench29.10
不开源
64
Gemini 1.5 Pro
MMLU Pro76.10
GPQA Diamond53.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
65
Qwen2.5-Max
MMLU Pro76.10
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
66
Haiku 4.5
MMLU Pro76.00
GPQA Diamond60.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.00
不开源
67
QwQ-32B
325B
MMLU Pro76.00
GPQA Diamond58.00
SWE-bench Verified0.00
MATH-50091.00
AIME 202479.50
LiveCodeBench0.00
Free commercial
68
DeepSeek-V3
6810B
MMLU Pro75.90
GPQA Diamond59.10
SWE-bench Verified0.00
MATH-50087.80
AIME 202439.00
LiveCodeBench34.60
Free commercial
69
Grok 2
2690B
MMLU Pro75.50
GPQA Diamond56.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
70
Llama 4 Scout Instruct
1090B
MMLU Pro74.30
GPQA Diamond57.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.80
Free commercial
71
GPT OSS 20B
210B
MMLU Pro74.00
GPQA Diamond71.50
SWE-bench Verified34.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
72
Llama3.1-405B Instruct
4050B
MMLU Pro73.40
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench30.20
Free commercial
73
Qwen3-235B-A22B
2350B
MMLU Pro72.90
GPQA Diamond71.10
SWE-bench Verified34.40
MATH-50096.20
AIME 202485.70
LiveCodeBench70.70
Free commercial
74
Qwen3-8B
80B
MMLU Pro72.50
GPQA Diamond39.30
SWE-bench Verified0.00
MATH-50087.40
AIME 202479.40
LiveCodeBench61.80
Free commercial
75
GLM-4-9B-Chat
90B
MMLU Pro72.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202476.40
LiveCodeBench51.80
Free commercial
76
Gemini 2.0 Flash-Lite
MMLU Pro71.60
GPQA Diamond51.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench28.90
不开源
77
QwQ-32B-Preview
320B
MMLU Pro70.97
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50090.60
AIME 202450.00
LiveCodeBench0.00
Free commercial
78
Phi 4 - 14B
140B
MMLU Pro70.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Non-commercial
79
Qwen2.5-32B
320B
MMLU Pro69.23
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.20
Free commercial
80
Qwen3-30B-A3B
305B
MMLU Pro69.10
GPQA Diamond54.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench29.00
Free commercial
81
Mistral-Small-3.2
240B
MMLU Pro69.06
GPQA Diamond46.13
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
82
Llama3.3-70B-Instruct
700B
MMLU Pro68.90
GPQA Diamond50.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
Free commercial
83
Claude3-Opus
MMLU Pro68.45
GPQA Diamond50.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
84
Gemma 3 - 27B (IT)
270B
MMLU Pro67.50
GPQA Diamond42.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202425.30
LiveCodeBench29.70
Free commercial
85
Hunyuan-A13B-Instruct
800B
MMLU Pro67.23
GPQA Diamond71.20
SWE-bench Verified0.00
MATH-5000.00
AIME 202487.30
LiveCodeBench63.90
Free commercial
86
Mistral-Small-3.1-24B-Instruct-2503
240B
MMLU Pro66.76
GPQA Diamond45.96
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
87
Llama3.1-70B-Instruct
700B
MMLU Pro66.40
GPQA Diamond48.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
Free commercial
88
Qwen3-Next
800B
MMLU Pro66.05
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench56.60
Free commercial
89
Claude 3.5 Haiku
MMLU Pro65.00
GPQA Diamond41.60
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
90
Qwen2.5-14B
140B
MMLU Pro63.69
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
91
Llama 4 Maverick
4000B
MMLU Pro62.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
92
GPT-4o mini
MMLU Pro61.70
GPQA Diamond41.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
93
Llama3.1-405B
4050B
MMLU Pro61.60
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
94
Gemma 3 - 12B (IT)
120B
MMLU Pro60.60
GPQA Diamond40.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench24.60
Free commercial
95
Llama 4 Scout
1090B
MMLU Pro58.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
96
Qwen2.5-72B
727B
MMLU Pro58.10
GPQA Diamond45.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
97
Claude3-Sonnet
MMLU Pro56.80
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
98
Gemma2-27B
270B
MMLU Pro56.54
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
99
Mixtral-8x22B-Instruct-v0.1
1410B
MMLU Pro56.33
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
100
Llama3-70B-Instruct
700B
MMLU Pro56.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
101
Phi-4-mini-instruct (3.8B)
38B
MMLU Pro52.80
GPQA Diamond36.00
SWE-bench Verified0.00
MATH-50071.80
AIME 202410.00
LiveCodeBench0.00
Free commercial
102
Llama3-70B
700B
MMLU Pro52.78
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
103
Llama3.1-70B
700B
MMLU Pro52.47
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
104
Grok-1.5
MMLU Pro51.00
GPQA Diamond35.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
105
C4AI Aya Vision 32B
320B
MMLU Pro47.16
GPQA Diamond33.84
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Non-commercial
106
Qwen2.5-7B
70B
MMLU Pro45.00
GPQA Diamond36.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
107
Gemma 2 - 9B
90B
MMLU Pro44.70
GPQA Diamond32.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
108
Llama3.1-8B-Instruct
80B
MMLU Pro44.00
GPQA Diamond26.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
109
Moonlight-16B-A3B-Instruct
160B
MMLU Pro42.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
110
Llama3.1-8B
80B
MMLU Pro35.40
GPQA Diamond25.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
111
Qwen2.5-3B
30B
MMLU Pro34.60
GPQA Diamond24.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
112
Mistral-7B-Instruct-v0.3
70B
MMLU Pro30.90
GPQA Diamond24.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
113
Llama-3.2-3B
32B
MMLU Pro25.00
GPQA Diamond26.60
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
114
GPT-5.1-Codex-Max
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified76.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
115
Qwen3.5-397B-A17B
397B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified76.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
116
GPT-5.1
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified76.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
117
o3-pro
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified75.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
118
GPT-5 Codex
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified74.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
119
Step 3.5 Flash
1960B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified74.40
MATH-5000.00
AIME 20240.00
LiveCodeBench86.40
Free commercial
120
GLM-4.7
3580B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
121
Grok 4 Heavy
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
122
Haiku 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
123
DeepSeek V3.2
6710B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.10
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
124
Claude Sonnet 4
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified72.70
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
125
Grok 4 Code
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified72.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
126
Kimi K2 Thinking
10400B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified71.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
127
Hunyuan-7B
70B
MMLU Pro0.00
GPQA Diamond60.10
SWE-bench Verified0.00
MATH-50093.70
AIME 202481.10
LiveCodeBench57.00
Free commercial
128
Grok Code Fast 1
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
129
Claude Sonnet 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified77.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
130
Claude Opus 4.1
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified79.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
131
GPT-5.2
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified80.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
132
MiniMax M2.5
2290B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified80.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
133
Claude Sonnet 4
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified80.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
134
Gemini 3.1 Pro Preview
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified80.60
MATH-5000.00
AIME 20240.00
LiveCodeBench2887.00
不开源
135
Claude Opus 4.6
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified80.84
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
136
Claude Sonnet 5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified82.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
137
Claude Sonnet 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified82.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
138
GPT-5-mini
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
139
Grok 3.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
140
Phi-4-instruct (reasoning-trained)
38B
MMLU Pro0.00
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-50090.40
AIME 202450.00
LiveCodeBench0.00
不开源
141
DeepSeek-R1-Distill-Qwen-7B
70B
MMLU Pro0.00
GPQA Diamond49.50
SWE-bench Verified0.00
MATH-50091.40
AIME 202453.30
LiveCodeBench0.00
Free commercial
142
GPT-4.1 nano
MMLU Pro0.00
GPQA Diamond50.30
SWE-bench Verified0.00
MATH-5000.00
AIME 202429.40
LiveCodeBench0.00
不开源
143
Qwen3-32B
320B
MMLU Pro0.00
GPQA Diamond53.30
SWE-bench Verified0.00
MATH-5000.00
AIME 202481.40
LiveCodeBench65.70
Free commercial
144
Qwen3-30B-A3B-2507
305B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified22.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
145
Codestral
220B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench31.50
Non-commercial
146
Codestral 25.01
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench37.90
不开源
147
QwQ-Max-Preview
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.60
Free commercial
148
Kimi-k1.6-IOI
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.90
不开源
149
OpenAI o3-mini (medium)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.40
不开源
150
Kimi-k1.6-IOI-high
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench73.80
不开源
151
Gemini 2.5 Pro Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.40
不开源
152
Qwen3.5-397B-A17B
397B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.60
Free commercial
153
Claude Opus 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench87.00
不开源
154
Gemini 2.5 Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench87.60
不开源
155
GPT OSS 20B
210B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
Free commercial
156
GPT OSS 120B
117B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.60
LiveCodeBench0.00
Free commercial
157
OpenAI o4 - mini
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202498.70
LiveCodeBench0.00
不开源
158
Kimi k1.5 (Short-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50094.60
AIME 20240.00
LiveCodeBench0.00
不开源
159
Kimi k1.5 (Long-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.20
AIME 20240.00
LiveCodeBench0.00
不开源
160
Qwen3-Coder-Next
80B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
161
Devstral Small 1.0
240B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified46.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
162
Qwen3-Coder-Flash
305B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified51.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
163
Devstral Small 1.1
240B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified53.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
164
Gemini 2.5 Flash-Preview-09-2025
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified54.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
165
Haiku 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified60.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
166
Devstral Medium
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified61.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
167
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified62.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
168
Qwen3-Coder-480B-A35B
4800B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified67.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
169
DeepSeek V3.2-Exp
6710B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified67.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
170
Kimi K2 0905
10000B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
171
Kimi K2 0905
10000B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
172
MiniMax M2
2300B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
173
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
174
GPT-5.1 Codex
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.40
MATH-5000.00
AIME 20240.00
LiveCodeBench85.50
不开源
175
GPT-5-Pro
MMLU Pro0.00
GPQA Diamond88.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
176
o3-pro
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.00
LiveCodeBench0.00
不开源
177
Gemini 2.5 Pro Experimental 03-25
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified63.80
MATH-5000.00
AIME 202492.00
LiveCodeBench70.40
不开源
178
Grok-3 mini - Reasoning
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
不开源
179
Grok-3 - Reasoning Beta
MMLU Pro0.00
GPQA Diamond84.60
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.30
LiveCodeBench79.40
不开源
180
Claude Sonnet 3.7-64K Extended Thinking
MMLU Pro0.00
GPQA Diamond84.80
SWE-bench Verified0.00
MATH-50096.20
AIME 202480.00
LiveCodeBench0.00
不开源
181
MiniMax M2.5
2290B
MMLU Pro0.00
GPQA Diamond85.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
182
Grok 4 Fast
MMLU Pro0.00
GPQA Diamond85.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
不开源
183
GPT-5
MMLU Pro0.00
GPQA Diamond85.70
SWE-bench Verified72.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
184
GLM-5
7440B
MMLU Pro0.00
GPQA Diamond86.00
SWE-bench Verified77.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
185
Gemini 2.5-Pro
MMLU Pro0.00
GPQA Diamond86.40
SWE-bench Verified67.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
186
GPT-5
MMLU Pro0.00
GPQA Diamond87.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
187
GPT-5.4 mini
MMLU Pro0.00
GPQA Diamond88.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
188
GPT-5.1
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
189
GPT-5.1
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified76.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
190
GPT-5.1
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
191
Claude Sonnet 4
MMLU Pro0.00
GPQA Diamond83.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
192
Grok 4 Heavy
MMLU Pro0.00
GPQA Diamond88.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
193
GPT-5-Pro
MMLU Pro0.00
GPQA Diamond89.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
194
Claude Sonnet 4.6
MMLU Pro0.00
GPQA Diamond89.90
SWE-bench Verified79.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
195
Gemini 3.0 Flash
MMLU Pro0.00
GPQA Diamond90.40
SWE-bench Verified68.70
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
196
Gemini 3.0 Pro (Preview 11-2025)
MMLU Pro0.00
GPQA Diamond91.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
197
Claude Opus 4.6
MMLU Pro0.00
GPQA Diamond91.31
SWE-bench Verified0.00
MATH-50097.60
AIME 20240.00
LiveCodeBench76.00
不开源
198
GPT-5.2
MMLU Pro0.00
GPQA Diamond92.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
199
GPT-5.4
MMLU Pro0.00
GPQA Diamond92.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
200
GPT-5.2 Pro
MMLU Pro0.00
GPQA Diamond93.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
201
GPT-5.2
MMLU Pro0.00
GPQA Diamond93.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
202
Gemini 3.0 Pro (Preview 11-2025)
MMLU Pro0.00
GPQA Diamond93.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
203
Gemini 3.1 Pro Preview
MMLU Pro0.00
GPQA Diamond94.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
204
GPT-5.4 Pro
MMLU Pro0.00
GPQA Diamond94.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
205
Amazon Nova Pro
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
206
Claude Sonnet 4.5
MMLU Pro0.00
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench59.00
不开源
207
Qwen3-8B
80B
MMLU Pro0.00
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-50097.40
AIME 202476.00
LiveCodeBench57.50
Free commercial
208
GPT-4.1 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified23.60
MATH-5000.00
AIME 202449.60
LiveCodeBench0.00
不开源
209
Grok 3 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202440.00
LiveCodeBench0.00
不开源
210
DeepSeek-R1-Distill-Llama-70B
700B
MMLU Pro0.00
GPQA Diamond65.20
SWE-bench Verified0.00
MATH-50094.50
AIME 20240.00
LiveCodeBench0.00
Free commercial
211
Qwen3-4B-Thinking-2507
40B
MMLU Pro0.00
GPQA Diamond65.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.20
Free commercial
212
GLM-4.7-Flash
310B
MMLU Pro0.00
GPQA Diamond66.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
213
Gemini 2.5 Flash-Lite
MMLU Pro0.00
GPQA Diamond66.70
SWE-bench Verified27.60
MATH-5000.00
AIME 20240.00
LiveCodeBench34.30
不开源
214
Claude Sonnet 4
MMLU Pro0.00
GPQA Diamond68.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202443.40
LiveCodeBench48.50
不开源
215
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond68.00
SWE-bench Verified0.00
MATH-50082.20
AIME 202423.30
LiveCodeBench0.00
不开源
216
Magistral-Small-2506
240B
MMLU Pro0.00
GPQA Diamond68.18
SWE-bench Verified0.00
MATH-5000.00
AIME 202470.68
LiveCodeBench55.84
Free commercial
217
Qwen3-32B
320B
MMLU Pro0.00
GPQA Diamond68.40
SWE-bench Verified0.00
MATH-50097.20
AIME 202481.40
LiveCodeBench0.00
Free commercial
218
OpenAI o3-mini
MMLU Pro0.00
GPQA Diamond70.60
SWE-bench Verified40.80
MATH-50095.80
AIME 202460.00
LiveCodeBench0.00
不开源
219
Magistral-Medium-2506
MMLU Pro0.00
GPQA Diamond70.83
SWE-bench Verified0.00
MATH-5000.00
AIME 202473.59
LiveCodeBench59.36
不开源
220
Qwen3-235B-A22B
2350B
MMLU Pro0.00
GPQA Diamond71.10
SWE-bench Verified0.00
MATH-50098.00
AIME 202485.70
LiveCodeBench70.70
Free commercial
221
Step3
3210B
MMLU Pro0.00
GPQA Diamond73.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.10
Free commercial
222
Qwen3-4B-2507
40B
MMLU Pro0.00
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.10
Free commercial
223
GLM-4.7-Flash
310B
MMLU Pro0.00
GPQA Diamond75.20
SWE-bench Verified59.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
224
ERNIE-4.5-VL-424B-A47B-Base
4240B
MMLU Pro0.00
GPQA Diamond76.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench38.80
Free commercial
225
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond77.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
226
GPT-5
MMLU Pro0.00
GPQA Diamond77.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
227
Gemini 2.5 Flash
MMLU Pro0.00
GPQA Diamond78.30
SWE-bench Verified50.00
MATH-5000.00
AIME 202488.00
LiveCodeBench41.10
不开源
228
OpenAI o3-mini (high)
MMLU Pro0.00
GPQA Diamond79.70
SWE-bench Verified49.30
MATH-50097.90
AIME 202487.00
LiveCodeBench69.50
不开源
229
Grok 3
MMLU Pro0.00
GPQA Diamond80.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202484.20
LiveCodeBench70.60
不开源
230
Claude Opus 4.1
MMLU Pro0.00
GPQA Diamond80.90
SWE-bench Verified74.50
MATH-5000.00
AIME 20240.00
LiveCodeBench65.00
不开源
231
DeepSeek V3.2
6710B
MMLU Pro0.00
GPQA Diamond82.40
SWE-bench Verified70.20
MATH-5000.00
AIME 20240.00
LiveCodeBench83.30
Free commercial
232
GPT-5.4 nano
MMLU Pro0.00
GPQA Diamond82.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
233
Gemini 2.5 Flash
MMLU Pro0.00
GPQA Diamond82.80
SWE-bench Verified48.90
MATH-5000.00
AIME 20240.00
LiveCodeBench55.40
不开源
234
GLM-4.6
3550B
MMLU Pro0.00
GPQA Diamond82.90
SWE-bench Verified68.00
MATH-5000.00
AIME 20240.00
LiveCodeBench84.50
Free commercial
235
Gemini-2.5-Pro-Preview-05-06
MMLU Pro0.00
GPQA Diamond83.00
SWE-bench Verified63.20
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
236
OpenAI o3
MMLU Pro0.00
GPQA Diamond83.30
SWE-bench Verified69.10
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源