DataLearner logoDataLearnerAI
AI Tech Blogs
Leaderboards
Benchmarks
Models
Resources
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款

LLM Benchmark Performance Comparison

Quickly view LLM performance across benchmarks like MMLU Pro, HLE, SWE-Bench, and more. Compare models across general knowledge, coding, and reasoning capabilities. Customize your comparison by selecting specific models and benchmarks.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

Filters

All3B and below7B13B34B65B100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI

LLM Performance Results

Data source: DataLearnerAI

MMLU Pro
RankModelMMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBenchParams (B)License
1OpenAI o191.0477.3048.9096.4079.2071.00—不开源
2Gemini 3.0 Pro (Preview 11-2025)90.0091.9076.200.000.0092.00—不开源
3Claude Opus 4.590.0087.0080.900.000.000.00—不开源
4Claude Opus 4.188.0081.0074.500.000.000.00—不开源
5M2.188.0081.0074.000.000.000.002300BFree commercial
6Claude Sonnet 4.588.0083.400.000.000.0071.00—不开源
7Hunyuan-T187.2069.300.0096.2078.2064.90—不开源
8Grok 487.0087.0058.600.000.0082.00—不开源
9GPT-4.586.1071.4038.0090.7036.7046.40—不开源
10Gemini 2.5-Pro86.000.000.0098.8092.0077.10—不开源
11Qwen3-Max-Thinking85.7087.4075.300.000.0085.9010000B不开源
12OpenAI o385.600.000.0098.1091.6075.80—不开源
13DeepSeek V3.2-Exp85.0079.900.000.000.0074.106710BFree commercial
14Grok 4.1 Fast85.0085.000.000.000.0082.00—不开源
15DeepSeek-V3.1 Terminus85.0080.7068.400.000.0074.906710BFree commercial
16DeepSeek-V3.1 Terminus85.0079.000.000.000.0080.006710BFree commercial
17DeepSeek-V3.185.0080.100.000.0093.1074.806710BFree commercial
18DeepSeek-R1-052885.0081.0057.6098.0091.4073.306710BFree commercial
19Claude Opus 485.0079.6072.5098.2076.0056.60—不开源
20GLM-4.584.6079.1064.2098.2091.0072.903550BFree commercial
21Kimi K2 Thinking84.6084.500.000.000.0083.1010400BFree commercial
22Qwen3-235B-A22B-Thinking-250784.4081.100.000.000.0074.102350BFree commercial
23Qwen3-235B-A22B-Thinking84.4081.100.000.000.0074.10305BFree commercial
24GLM-4.784.3085.700.000.000.0084.903580BFree commercial
25DeepSeek-R184.0071.5049.2097.3079.8065.906710BFree commercial
26Claude Sonnet 484.0075.400.000.000.0066.00—不开源
27Qwen3 Max (Preview)84.0076.0069.600.000.0057.50—不开源
28DeepSeek V3.2-Exp84.0074.000.000.000.0055.006710BFree commercial
29DeepSeek-V3.183.7074.9066.000.0066.3056.406710BFree commercial
30Intern-S183.5077.300.000.000.000.002410BFree commercial
31Qwen3-235B-A22B-250783.0077.500.000.000.0051.802350BFree commercial
32GLM-4.683.0081.000.000.000.0082.803550BFree commercial
33Pangu Pro MoE82.6073.700.0096.8079.2059.60719BFree commercial
34Llama 4 Behemoth Instruct82.2073.700.0095.000.0049.4020000BFree commercial
35MiniMax M282.0078.000.000.000.0083.002300BFree commercial
36GLM-4.5-Air81.4075.0057.6098.1089.4070.701060BFree commercial
37DeepSeek-V3-032481.2068.4038.8094.0059.4049.206710BFree commercial
38MiniMax-M1-80k81.1070.0056.0096.8086.0065.004560BFree commercial
39Kimi K281.1075.1051.8097.4069.6053.7010000BFree commercial
40OpenAI o4 - mini80.6081.4068.100.0093.400.00—不开源
41MiniMax-M1-40k80.6069.2055.6096.0083.3062.304560BFree commercial
42GPT-4.180.5066.3054.6092.8048.1040.50—不开源
43Llama 4 Maverick Instruct80.5069.800.000.000.0043.404000BFree commercial
44OpenAI o1-mini80.3060.000.0090.0063.6052.00—不开源
45Haiku 4.580.0060.5060.600.000.0051.00—不开源
46GPT-4o(2025-03-27)79.8066.900.000.000.0035.80—不开源
47Gemini 2.0 Pro Experimental79.1064.700.000.0036.000.00—不开源
48Hunyuan-TurboS79.0057.500.000.000.0032.00—不开源
49Pangu Embedded79.000.000.0092.4081.9067.1070BFree commercial
50GPT OSS 120B79.0080.1060.100.000.000.00117BFree commercial
51Kimi K2.578.5087.6076.800.000.0085.0010000BFree commercial
52ERNIE-4.5-300B-A47B78.400.000.0096.4054.8038.803000BFree commercial
53Qwen3-30B-A3B-250778.4070.400.000.000.0043.20305BFree commercial
54Claude 3.5 Sonnet New78.0065.0049.0078.0016.0038.70—不开源
55GLM-4.678.0063.0068.000.000.0056.003550BFree commercial
56GPT-5-mini78.0069.000.000.000.0055.00—不开源
57GPT-4o77.9070.1031.0075.909.3035.10—不开源
58GPT-4o(2024-11-20)77.900.000.000.000.000.00—不开源
59Claude 3.5 Sonnet77.6459.400.000.000.000.00—不开源
60Gemini 2.0 Flash Experimental76.2465.2021.400.000.0029.10—不开源
61Gemini 1.5 Pro76.1053.500.000.000.000.00—不开源
62Qwen2.5-Max76.100.000.000.000.000.00—不开源
63QwQ-32B76.0058.000.0091.0079.500.00325BFree commercial
64Haiku 4.576.0073.300.000.000.0062.00—不开源
65DeepSeek-V375.9059.100.0087.8039.0034.606810BFree commercial
66Grok 275.5056.000.000.000.000.002690BFree commercial
67Llama 4 Scout Instruct74.3057.200.000.000.0032.801090BFree commercial
68GPT OSS 20B74.0071.5034.000.000.000.00210BFree commercial
69Llama3.1-405B Instruct73.4049.000.000.000.0030.204050BFree commercial
70Qwen3-235B-A22B72.9071.1034.4096.2085.7070.702350BFree commercial
71Qwen3-8B72.5039.300.0087.4079.4061.8080BFree commercial
72GLM-4-9B-Chat72.400.000.000.0076.4051.8090BFree commercial
73Gemini 2.0 Flash-Lite71.6051.500.000.000.0028.90—不开源
74QwQ-32B-Preview70.970.000.0090.6050.000.00320BFree commercial
75Phi 4 - 14B70.400.000.000.000.000.00140BNon-commercial
76Qwen2.5-32B69.230.000.000.000.0051.20320BFree commercial
77Qwen3-30B-A3B69.1054.800.000.000.0029.00305BFree commercial
78Mistral-Small-3.269.0646.130.000.000.000.00240BFree commercial
79Llama3.3-70B-Instruct68.9050.500.000.000.0033.30700BFree commercial
80Claude3-Opus68.4550.400.000.000.000.00—不开源
81Gemma 3 - 27B (IT)67.5042.400.000.0025.3029.70270BFree commercial
82Hunyuan-A13B-Instruct67.2371.200.000.0087.3063.90800BFree commercial
83Mistral-Small-3.1-24B-Instruct-250366.7645.960.000.000.000.00240BFree commercial
84Llama3.1-70B-Instruct66.4048.000.000.000.0033.30700BFree commercial
85Qwen3-Next66.050.000.000.000.0056.60800BFree commercial
86Claude 3.5 Haiku65.0041.600.000.000.000.00—不开源
87Qwen2.5-14B63.690.000.000.000.000.00140BFree commercial
88Llama 4 Maverick62.900.000.000.000.000.004000BFree commercial
89GPT-4o mini61.7041.100.000.000.000.00—不开源
90Llama3.1-405B61.600.000.000.000.000.004050BFree commercial
91Gemma 3 - 12B (IT)60.6040.900.000.000.0024.60120BFree commercial
92Llama 4 Scout58.200.000.000.000.000.001090BFree commercial
93Qwen2.5-72B58.1045.900.000.000.000.00727BFree commercial
94Claude3-Sonnet56.800.000.000.000.000.00—不开源
95Gemma2-27B56.540.000.000.000.000.00270BFree commercial
96Mixtral-8x22B-Instruct-v0.156.330.000.000.000.000.001410BFree commercial
97Llama3-70B-Instruct56.200.000.000.000.000.00700BFree commercial
98Phi-4-mini-instruct (3.8B)52.8036.000.0071.8010.000.0038BFree commercial
99Llama3-70B52.780.000.000.000.000.00700BFree commercial
100Llama3.1-70B52.470.000.000.000.000.00700BFree commercial
101Grok-1.551.0035.900.000.000.000.00—不开源
102C4AI Aya Vision 32B47.1633.840.000.000.000.00320BNon-commercial
103Qwen2.5-7B45.0036.400.000.000.000.0070BFree commercial
104Gemma 2 - 9B44.7032.800.000.000.000.0090BFree commercial
105Llama3.1-8B-Instruct44.0026.300.000.000.000.0080BFree commercial
106Moonlight-16B-A3B-Instruct42.400.000.000.000.000.00160BFree commercial
107Llama3.1-8B35.4025.800.000.000.000.0080BFree commercial
108Qwen2.5-3B34.6024.300.000.000.000.0030BFree commercial
109Mistral-7B-Instruct-v0.330.9024.700.000.000.000.0070BFree commercial
110Llama-3.2-3B25.0026.600.000.000.000.0032BFree commercial
111o3-pro0.000.0075.000.000.000.00—不开源
112GPT-5.1 Codex0.000.0070.400.000.0085.50—不开源
113GPT-5 Codex0.000.0074.500.000.000.00—不开源
114StepFun Flash 3.50.000.0074.400.000.0086.401960BFree commercial
115GLM-4.70.000.0073.800.000.000.003580BFree commercial
116Grok 4 Heavy0.000.0073.500.000.000.00—不开源
117Haiku 4.50.000.0073.300.000.000.00—不开源
118DeepSeek V3.20.000.0073.100.000.000.006710BFree commercial
119Claude Sonnet 40.000.0072.700.000.000.00—不开源
120Grok 4 Code0.000.0072.000.000.000.00—不开源
121Kimi K2 Thinking0.000.0071.300.000.000.0010400BFree commercial
122Grok Code Fast 10.000.0070.800.000.000.00—不开源
123Hunyuan-7B0.0060.100.0093.7081.1057.0070BFree commercial
124GPT-5.1-Codex-Max0.000.0076.800.000.000.00—不开源
125Claude Sonnet 4.50.000.0077.200.000.000.00—不开源
126Claude Opus 4.10.000.0079.400.000.000.00—不开源
127Claude Sonnet 40.000.0080.200.000.000.00—不开源
128Claude Sonnet 4.50.000.0082.000.000.000.00—不开源
129GPT-5-mini0.000.000.000.000.000.00—不开源
130Grok 3.50.000.000.000.000.000.00—不开源
131Phi-4-instruct (reasoning-trained)0.0049.000.0090.4050.000.0038B不开源
132DeepSeek-R1-Distill-Qwen-7B0.0049.500.0091.4053.300.0070BFree commercial
133GPT-4.1 nano0.0050.300.000.0029.400.00—不开源
134Qwen3-32B0.0053.300.000.0081.4065.70320BFree commercial
135Codestral0.000.000.000.000.0031.50220BNon-commercial
136Kimi k1.5 (Short-CoT)0.000.000.0094.600.000.00—不开源
137Codestral 25.010.000.000.000.000.0037.90—不开源
138QwQ-Max-Preview0.000.000.000.000.0065.60—Free commercial
139Kimi-k1.6-IOI0.000.000.000.000.0065.90—不开源
140OpenAI o3-mini (medium)0.000.000.000.000.0067.40—不开源
141Kimi-k1.6-IOI-high0.000.000.000.000.0073.80—不开源
142Gemini 2.5 Pro Deep Think0.000.000.000.000.0080.40—不开源
143Claude Opus 4.50.000.000.000.000.0087.00—不开源
144Gemini 2.5 Deep Think0.000.000.000.000.0087.60—不开源
145GPT OSS 20B0.000.000.000.0096.000.00210BFree commercial
146GPT OSS 120B0.000.000.000.0096.600.00117BFree commercial
147OpenAI o4 - mini0.000.000.000.0098.700.00—不开源
148MiniMax M20.000.0069.400.000.000.002300BFree commercial
149Kimi k1.5 (Long-CoT)0.000.000.0096.200.000.00—不开源
150Qwen3-30B-A3B-25070.000.0022.000.000.000.00305BFree commercial
151Devstral Small 1.00.000.0046.800.000.000.00240BFree commercial
152Qwen3-Coder-Flash0.000.0051.600.000.000.00305BFree commercial
153Devstral Small 1.10.000.0053.600.000.000.00240BFree commercial
154Gemini 2.5 Flash-Preview-09-20250.000.0054.000.000.000.00—不开源
155Devstral Medium0.000.0061.600.000.000.00—不开源
156Qwen3-Coder-480B-A35B0.000.0067.000.000.000.004800BFree commercial
157DeepSeek V3.2-Exp0.000.0067.800.000.000.006710BFree commercial
158Kimi K2 09050.000.0069.200.000.000.0010000BFree commercial
159Kimi K2 09050.000.0069.200.000.000.0010000BFree commercial
160Gemini 2.5-Pro0.0086.4067.200.000.000.00—不开源
161Gemini 2.5 Flash0.0082.8048.900.000.0055.40—不开源
162GLM-4.60.0082.9068.000.000.0084.503550BFree commercial
163Gemini-2.5-Pro-Preview-05-060.0083.0063.2098.8092.0077.10—不开源
164OpenAI o30.0083.3069.100.000.000.00—不开源
165Claude Sonnet 40.0083.800.000.000.000.00—不开源
166o3-pro0.0084.000.000.0093.000.00—不开源
167Gemini 2.5 Pro Experimental 03-250.0084.0063.800.0092.0070.40—不开源
168Grok-3 mini - Reasoning0.0084.000.000.0096.000.00—不开源
169Grok-3 - Reasoning Beta0.0084.600.000.0093.3079.40—不开源
170Claude Sonnet 3.7-64K Extended Thinking0.0084.800.0096.2080.000.00—不开源
171Grok 4 Fast0.0085.700.000.000.0080.00—不开源
172GPT-50.0085.7072.800.000.000.00—不开源
173DeepSeek V3.20.0082.400.000.000.0083.306710BFree commercial
174GPT-50.0087.300.000.000.000.00—不开源
175GPT-5.10.0088.1076.300.000.000.00—不开源
176GPT-5.10.0088.100.000.000.000.00—不开源
177GPT-5-Pro0.0088.400.000.000.000.00—不开源
178Grok 4 Heavy0.0088.900.000.000.000.00—不开源
179GPT-5-Pro0.0089.400.000.000.000.00—不开源
180Gemini 3.0 Flash0.0090.4068.700.000.000.00—不开源
181GPT-5.20.0092.4080.000.000.000.00—不开源
182GPT-5.2 Pro0.0093.200.000.000.000.00—不开源
183Gemini 3.0 Pro (Preview 11-2025)0.0093.800.000.000.000.00—不开源
184Amazon Nova Pro0.000.000.000.000.000.00—不开源
185OpenAI o3-mini0.0070.6040.8095.8060.000.00—不开源
186Qwen3-8B0.0062.000.0097.4076.0057.5080BFree commercial
187GPT-4.1 mini0.0065.0023.600.0049.600.00—不开源
188Grok 3 mini0.0065.000.000.0040.000.00—不开源
189DeepSeek-R1-Distill-Llama-70B0.0065.200.0094.500.000.00700BFree commercial
190Qwen3-4B-Thinking-25070.0065.800.000.000.0055.2040BFree commercial
191GLM-4.7-Flash0.0066.000.000.000.000.00310BFree commercial
192Gemini 2.5 Flash-Lite0.0066.7027.600.000.0034.30—不开源
193Claude Sonnet 40.0068.000.000.0043.4048.50—不开源
194Claude Sonnet 3.70.0068.0070.3082.2023.300.00—不开源
195Magistral-Small-25060.0068.180.000.0070.6855.84240BFree commercial
196Qwen3-32B0.0068.400.0097.2081.400.00320BFree commercial
197Qwen3-4B-25070.0062.000.000.000.0035.1040BFree commercial
198Magistral-Medium-25060.0070.830.000.0073.5959.36—不开源
199Qwen3-235B-A22B0.0071.100.0098.0085.7070.702350BFree commercial
200Step30.0073.000.000.000.0067.103210BFree commercial
201Claude Sonnet 4.50.0073.7064.800.000.0059.00—不开源
202GLM-4.7-Flash0.0075.2059.200.000.000.00310BFree commercial
203ERNIE-4.5-VL-424B-A47B-Base0.0076.800.000.000.0038.804240BFree commercial
204GPT-50.0077.800.000.000.000.00—不开源
205Gemini 2.5 Flash0.0078.3050.000.0088.0041.10—不开源
206OpenAI o3-mini (high)0.0079.7049.3097.9087.0069.50—不开源
207Grok 30.0080.400.000.0084.2070.60—不开源
208Claude Opus 4.10.0080.9074.500.000.0065.00—不开源
1
OpenAI o1
MMLU Pro91.04
GPQA Diamond77.30
SWE-bench Verified48.90
MATH-50096.40
AIME 202479.20
LiveCodeBench71.00
不开源
2
Gemini 3.0 Pro (Preview 11-2025)
MMLU Pro90.00
GPQA Diamond91.90
SWE-bench Verified76.20
MATH-5000.00
AIME 20240.00
LiveCodeBench92.00
不开源
3
Claude Opus 4.5
MMLU Pro90.00
GPQA Diamond87.00
SWE-bench Verified80.90
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
4
Claude Opus 4.1
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified74.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
5
M2.1
2300B
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified74.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
6
Claude Sonnet 4.5
MMLU Pro88.00
GPQA Diamond83.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench71.00
不开源
7
Hunyuan-T1
MMLU Pro87.20
GPQA Diamond69.30
SWE-bench Verified0.00
MATH-50096.20
AIME 202478.20
LiveCodeBench64.90
不开源
8
Grok 4
MMLU Pro87.00
GPQA Diamond87.00
SWE-bench Verified58.60
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
不开源
9
GPT-4.5
MMLU Pro86.10
GPQA Diamond71.40
SWE-bench Verified38.00
MATH-50090.70
AIME 202436.70
LiveCodeBench46.40
不开源
10
Gemini 2.5-Pro
MMLU Pro86.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
11
Qwen3-Max-Thinking
10000B
MMLU Pro85.70
GPQA Diamond87.40
SWE-bench Verified75.30
MATH-5000.00
AIME 20240.00
LiveCodeBench85.90
不开源
12
OpenAI o3
MMLU Pro85.60
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50098.10
AIME 202491.60
LiveCodeBench75.80
不开源
13
DeepSeek V3.2-Exp
6710B
MMLU Pro85.00
GPQA Diamond79.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
14
Grok 4.1 Fast
MMLU Pro85.00
GPQA Diamond85.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
不开源
15
DeepSeek-V3.1 Terminus
6710B
MMLU Pro85.00
GPQA Diamond80.70
SWE-bench Verified68.40
MATH-5000.00
AIME 20240.00
LiveCodeBench74.90
Free commercial
16
DeepSeek-V3.1 Terminus
6710B
MMLU Pro85.00
GPQA Diamond79.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
Free commercial
17
DeepSeek-V3.1
6710B
MMLU Pro85.00
GPQA Diamond80.10
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.10
LiveCodeBench74.80
Free commercial
18
DeepSeek-R1-0528
6710B
MMLU Pro85.00
GPQA Diamond81.00
SWE-bench Verified57.60
MATH-50098.00
AIME 202491.40
LiveCodeBench73.30
Free commercial
19
Claude Opus 4
MMLU Pro85.00
GPQA Diamond79.60
SWE-bench Verified72.50
MATH-50098.20
AIME 202476.00
LiveCodeBench56.60
不开源
20
GLM-4.5
3550B
MMLU Pro84.60
GPQA Diamond79.10
SWE-bench Verified64.20
MATH-50098.20
AIME 202491.00
LiveCodeBench72.90
Free commercial
21
Kimi K2 Thinking
10400B
MMLU Pro84.60
GPQA Diamond84.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.10
Free commercial
22
Qwen3-235B-A22B-Thinking-2507
2350B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
23
Qwen3-235B-A22B-Thinking
305B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
24
GLM-4.7
3580B
MMLU Pro84.30
GPQA Diamond85.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench84.90
Free commercial
25
DeepSeek-R1
6710B
MMLU Pro84.00
GPQA Diamond71.50
SWE-bench Verified49.20
MATH-50097.30
AIME 202479.80
LiveCodeBench65.90
Free commercial
26
Claude Sonnet 4
MMLU Pro84.00
GPQA Diamond75.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench66.00
不开源
27
Qwen3 Max (Preview)
MMLU Pro84.00
GPQA Diamond76.00
SWE-bench Verified69.60
MATH-5000.00
AIME 20240.00
LiveCodeBench57.50
不开源
28
DeepSeek V3.2-Exp
6710B
MMLU Pro84.00
GPQA Diamond74.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.00
Free commercial
29
DeepSeek-V3.1
6710B
MMLU Pro83.70
GPQA Diamond74.90
SWE-bench Verified66.00
MATH-5000.00
AIME 202466.30
LiveCodeBench56.40
Free commercial
30
Intern-S1
2410B
MMLU Pro83.50
GPQA Diamond77.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
31
Qwen3-235B-A22B-2507
2350B
MMLU Pro83.00
GPQA Diamond77.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.80
Free commercial
32
GLM-4.6
3550B
MMLU Pro83.00
GPQA Diamond81.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench82.80
Free commercial
33
Pangu Pro MoE
719B
MMLU Pro82.60
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50096.80
AIME 202479.20
LiveCodeBench59.60
Free commercial
34
Llama 4 Behemoth Instruct
20000B
MMLU Pro82.20
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50095.00
AIME 20240.00
LiveCodeBench49.40
Free commercial
35
MiniMax M2
2300B
MMLU Pro82.00
GPQA Diamond78.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.00
Free commercial
36
GLM-4.5-Air
1060B
MMLU Pro81.40
GPQA Diamond75.00
SWE-bench Verified57.60
MATH-50098.10
AIME 202489.40
LiveCodeBench70.70
Free commercial
37
DeepSeek-V3-0324
6710B
MMLU Pro81.20
GPQA Diamond68.40
SWE-bench Verified38.80
MATH-50094.00
AIME 202459.40
LiveCodeBench49.20
Free commercial
38
MiniMax-M1-80k
4560B
MMLU Pro81.10
GPQA Diamond70.00
SWE-bench Verified56.00
MATH-50096.80
AIME 202486.00
LiveCodeBench65.00
Free commercial
39
Kimi K2
10000B
MMLU Pro81.10
GPQA Diamond75.10
SWE-bench Verified51.80
MATH-50097.40
AIME 202469.60
LiveCodeBench53.70
Free commercial
40
OpenAI o4 - mini
MMLU Pro80.60
GPQA Diamond81.40
SWE-bench Verified68.10
MATH-5000.00
AIME 202493.40
LiveCodeBench0.00
不开源
41
MiniMax-M1-40k
4560B
MMLU Pro80.60
GPQA Diamond69.20
SWE-bench Verified55.60
MATH-50096.00
AIME 202483.30
LiveCodeBench62.30
Free commercial
42
GPT-4.1
MMLU Pro80.50
GPQA Diamond66.30
SWE-bench Verified54.60
MATH-50092.80
AIME 202448.10
LiveCodeBench40.50
不开源
43
Llama 4 Maverick Instruct
4000B
MMLU Pro80.50
GPQA Diamond69.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.40
Free commercial
44
OpenAI o1-mini
MMLU Pro80.30
GPQA Diamond60.00
SWE-bench Verified0.00
MATH-50090.00
AIME 202463.60
LiveCodeBench52.00
不开源
45
Haiku 4.5
MMLU Pro80.00
GPQA Diamond60.50
SWE-bench Verified60.60
MATH-5000.00
AIME 20240.00
LiveCodeBench51.00
不开源
46
GPT-4o(2025-03-27)
MMLU Pro79.80
GPQA Diamond66.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.80
不开源
47
Gemini 2.0 Pro Experimental
MMLU Pro79.10
GPQA Diamond64.70
SWE-bench Verified0.00
MATH-5000.00
AIME 202436.00
LiveCodeBench0.00
不开源
48
Hunyuan-TurboS
MMLU Pro79.00
GPQA Diamond57.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.00
不开源
49
Pangu Embedded
70B
MMLU Pro79.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50092.40
AIME 202481.90
LiveCodeBench67.10
Free commercial
50
GPT OSS 120B
117B
MMLU Pro79.00
GPQA Diamond80.10
SWE-bench Verified60.10
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
51
Kimi K2.5
10000B
MMLU Pro78.50
GPQA Diamond87.60
SWE-bench Verified76.80
MATH-5000.00
AIME 20240.00
LiveCodeBench85.00
Free commercial
52
ERNIE-4.5-300B-A47B
3000B
MMLU Pro78.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.40
AIME 202454.80
LiveCodeBench38.80
Free commercial
53
Qwen3-30B-A3B-2507
305B
MMLU Pro78.40
GPQA Diamond70.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.20
Free commercial
54
Claude 3.5 Sonnet New
MMLU Pro78.00
GPQA Diamond65.00
SWE-bench Verified49.00
MATH-50078.00
AIME 202416.00
LiveCodeBench38.70
不开源
55
GLM-4.6
3550B
MMLU Pro78.00
GPQA Diamond63.00
SWE-bench Verified68.00
MATH-5000.00
AIME 20240.00
LiveCodeBench56.00
Free commercial
56
GPT-5-mini
MMLU Pro78.00
GPQA Diamond69.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.00
不开源
57
GPT-4o
MMLU Pro77.90
GPQA Diamond70.10
SWE-bench Verified31.00
MATH-50075.90
AIME 20249.30
LiveCodeBench35.10
不开源
58
GPT-4o(2024-11-20)
MMLU Pro77.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
59
Claude 3.5 Sonnet
MMLU Pro77.64
GPQA Diamond59.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
60
Gemini 2.0 Flash Experimental
MMLU Pro76.24
GPQA Diamond65.20
SWE-bench Verified21.40
MATH-5000.00
AIME 20240.00
LiveCodeBench29.10
不开源
61
Gemini 1.5 Pro
MMLU Pro76.10
GPQA Diamond53.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
62
Qwen2.5-Max
MMLU Pro76.10
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
63
QwQ-32B
325B
MMLU Pro76.00
GPQA Diamond58.00
SWE-bench Verified0.00
MATH-50091.00
AIME 202479.50
LiveCodeBench0.00
Free commercial
64
Haiku 4.5
MMLU Pro76.00
GPQA Diamond73.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench62.00
不开源
65
DeepSeek-V3
6810B
MMLU Pro75.90
GPQA Diamond59.10
SWE-bench Verified0.00
MATH-50087.80
AIME 202439.00
LiveCodeBench34.60
Free commercial
66
Grok 2
2690B
MMLU Pro75.50
GPQA Diamond56.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
67
Llama 4 Scout Instruct
1090B
MMLU Pro74.30
GPQA Diamond57.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.80
Free commercial
68
GPT OSS 20B
210B
MMLU Pro74.00
GPQA Diamond71.50
SWE-bench Verified34.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
69
Llama3.1-405B Instruct
4050B
MMLU Pro73.40
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench30.20
Free commercial
70
Qwen3-235B-A22B
2350B
MMLU Pro72.90
GPQA Diamond71.10
SWE-bench Verified34.40
MATH-50096.20
AIME 202485.70
LiveCodeBench70.70
Free commercial
71
Qwen3-8B
80B
MMLU Pro72.50
GPQA Diamond39.30
SWE-bench Verified0.00
MATH-50087.40
AIME 202479.40
LiveCodeBench61.80
Free commercial
72
GLM-4-9B-Chat
90B
MMLU Pro72.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202476.40
LiveCodeBench51.80
Free commercial
73
Gemini 2.0 Flash-Lite
MMLU Pro71.60
GPQA Diamond51.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench28.90
不开源
74
QwQ-32B-Preview
320B
MMLU Pro70.97
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50090.60
AIME 202450.00
LiveCodeBench0.00
Free commercial
75
Phi 4 - 14B
140B
MMLU Pro70.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Non-commercial
76
Qwen2.5-32B
320B
MMLU Pro69.23
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.20
Free commercial
77
Qwen3-30B-A3B
305B
MMLU Pro69.10
GPQA Diamond54.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench29.00
Free commercial
78
Mistral-Small-3.2
240B
MMLU Pro69.06
GPQA Diamond46.13
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
79
Llama3.3-70B-Instruct
700B
MMLU Pro68.90
GPQA Diamond50.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
Free commercial
80
Claude3-Opus
MMLU Pro68.45
GPQA Diamond50.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
81
Gemma 3 - 27B (IT)
270B
MMLU Pro67.50
GPQA Diamond42.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202425.30
LiveCodeBench29.70
Free commercial
82
Hunyuan-A13B-Instruct
800B
MMLU Pro67.23
GPQA Diamond71.20
SWE-bench Verified0.00
MATH-5000.00
AIME 202487.30
LiveCodeBench63.90
Free commercial
83
Mistral-Small-3.1-24B-Instruct-2503
240B
MMLU Pro66.76
GPQA Diamond45.96
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
84
Llama3.1-70B-Instruct
700B
MMLU Pro66.40
GPQA Diamond48.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
Free commercial
85
Qwen3-Next
800B
MMLU Pro66.05
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench56.60
Free commercial
86
Claude 3.5 Haiku
MMLU Pro65.00
GPQA Diamond41.60
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
87
Qwen2.5-14B
140B
MMLU Pro63.69
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
88
Llama 4 Maverick
4000B
MMLU Pro62.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
89
GPT-4o mini
MMLU Pro61.70
GPQA Diamond41.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
90
Llama3.1-405B
4050B
MMLU Pro61.60
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
91
Gemma 3 - 12B (IT)
120B
MMLU Pro60.60
GPQA Diamond40.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench24.60
Free commercial
92
Llama 4 Scout
1090B
MMLU Pro58.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
93
Qwen2.5-72B
727B
MMLU Pro58.10
GPQA Diamond45.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
94
Claude3-Sonnet
MMLU Pro56.80
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
95
Gemma2-27B
270B
MMLU Pro56.54
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
96
Mixtral-8x22B-Instruct-v0.1
1410B
MMLU Pro56.33
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
97
Llama3-70B-Instruct
700B
MMLU Pro56.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
98
Phi-4-mini-instruct (3.8B)
38B
MMLU Pro52.80
GPQA Diamond36.00
SWE-bench Verified0.00
MATH-50071.80
AIME 202410.00
LiveCodeBench0.00
Free commercial
99
Llama3-70B
700B
MMLU Pro52.78
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
100
Llama3.1-70B
700B
MMLU Pro52.47
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
101
Grok-1.5
MMLU Pro51.00
GPQA Diamond35.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
102
C4AI Aya Vision 32B
320B
MMLU Pro47.16
GPQA Diamond33.84
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Non-commercial
103
Qwen2.5-7B
70B
MMLU Pro45.00
GPQA Diamond36.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
104
Gemma 2 - 9B
90B
MMLU Pro44.70
GPQA Diamond32.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
105
Llama3.1-8B-Instruct
80B
MMLU Pro44.00
GPQA Diamond26.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
106
Moonlight-16B-A3B-Instruct
160B
MMLU Pro42.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
107
Llama3.1-8B
80B
MMLU Pro35.40
GPQA Diamond25.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
108
Qwen2.5-3B
30B
MMLU Pro34.60
GPQA Diamond24.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
109
Mistral-7B-Instruct-v0.3
70B
MMLU Pro30.90
GPQA Diamond24.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
110
Llama-3.2-3B
32B
MMLU Pro25.00
GPQA Diamond26.60
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
111
o3-pro
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified75.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
112
GPT-5.1 Codex
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.40
MATH-5000.00
AIME 20240.00
LiveCodeBench85.50
不开源
113
GPT-5 Codex
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified74.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
114
StepFun Flash 3.5
1960B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified74.40
MATH-5000.00
AIME 20240.00
LiveCodeBench86.40
Free commercial
115
GLM-4.7
3580B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
116
Grok 4 Heavy
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
117
Haiku 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
118
DeepSeek V3.2
6710B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified73.10
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
119
Claude Sonnet 4
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified72.70
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
120
Grok 4 Code
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified72.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
121
Kimi K2 Thinking
10400B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified71.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
122
Grok Code Fast 1
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
123
Hunyuan-7B
70B
MMLU Pro0.00
GPQA Diamond60.10
SWE-bench Verified0.00
MATH-50093.70
AIME 202481.10
LiveCodeBench57.00
Free commercial
124
GPT-5.1-Codex-Max
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified76.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
125
Claude Sonnet 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified77.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
126
Claude Opus 4.1
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified79.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
127
Claude Sonnet 4
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified80.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
128
Claude Sonnet 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified82.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
129
GPT-5-mini
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
130
Grok 3.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
131
Phi-4-instruct (reasoning-trained)
38B
MMLU Pro0.00
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-50090.40
AIME 202450.00
LiveCodeBench0.00
不开源
132
DeepSeek-R1-Distill-Qwen-7B
70B
MMLU Pro0.00
GPQA Diamond49.50
SWE-bench Verified0.00
MATH-50091.40
AIME 202453.30
LiveCodeBench0.00
Free commercial
133
GPT-4.1 nano
MMLU Pro0.00
GPQA Diamond50.30
SWE-bench Verified0.00
MATH-5000.00
AIME 202429.40
LiveCodeBench0.00
不开源
134
Qwen3-32B
320B
MMLU Pro0.00
GPQA Diamond53.30
SWE-bench Verified0.00
MATH-5000.00
AIME 202481.40
LiveCodeBench65.70
Free commercial
135
Codestral
220B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench31.50
Non-commercial
136
Kimi k1.5 (Short-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50094.60
AIME 20240.00
LiveCodeBench0.00
不开源
137
Codestral 25.01
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench37.90
不开源
138
QwQ-Max-Preview
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.60
Free commercial
139
Kimi-k1.6-IOI
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.90
不开源
140
OpenAI o3-mini (medium)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.40
不开源
141
Kimi-k1.6-IOI-high
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench73.80
不开源
142
Gemini 2.5 Pro Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.40
不开源
143
Claude Opus 4.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench87.00
不开源
144
Gemini 2.5 Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench87.60
不开源
145
GPT OSS 20B
210B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
Free commercial
146
GPT OSS 120B
117B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.60
LiveCodeBench0.00
Free commercial
147
OpenAI o4 - mini
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202498.70
LiveCodeBench0.00
不开源
148
MiniMax M2
2300B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
149
Kimi k1.5 (Long-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.20
AIME 20240.00
LiveCodeBench0.00
不开源
150
Qwen3-30B-A3B-2507
305B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified22.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
151
Devstral Small 1.0
240B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified46.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
152
Qwen3-Coder-Flash
305B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified51.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
153
Devstral Small 1.1
240B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified53.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
154
Gemini 2.5 Flash-Preview-09-2025
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified54.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
155
Devstral Medium
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified61.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
156
Qwen3-Coder-480B-A35B
4800B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified67.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
157
DeepSeek V3.2-Exp
6710B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified67.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
158
Kimi K2 0905
10000B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
159
Kimi K2 0905
10000B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
160
Gemini 2.5-Pro
MMLU Pro0.00
GPQA Diamond86.40
SWE-bench Verified67.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
161
Gemini 2.5 Flash
MMLU Pro0.00
GPQA Diamond82.80
SWE-bench Verified48.90
MATH-5000.00
AIME 20240.00
LiveCodeBench55.40
不开源
162
GLM-4.6
3550B
MMLU Pro0.00
GPQA Diamond82.90
SWE-bench Verified68.00
MATH-5000.00
AIME 20240.00
LiveCodeBench84.50
Free commercial
163
Gemini-2.5-Pro-Preview-05-06
MMLU Pro0.00
GPQA Diamond83.00
SWE-bench Verified63.20
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
164
OpenAI o3
MMLU Pro0.00
GPQA Diamond83.30
SWE-bench Verified69.10
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
165
Claude Sonnet 4
MMLU Pro0.00
GPQA Diamond83.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
166
o3-pro
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.00
LiveCodeBench0.00
不开源
167
Gemini 2.5 Pro Experimental 03-25
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified63.80
MATH-5000.00
AIME 202492.00
LiveCodeBench70.40
不开源
168
Grok-3 mini - Reasoning
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
不开源
169
Grok-3 - Reasoning Beta
MMLU Pro0.00
GPQA Diamond84.60
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.30
LiveCodeBench79.40
不开源
170
Claude Sonnet 3.7-64K Extended Thinking
MMLU Pro0.00
GPQA Diamond84.80
SWE-bench Verified0.00
MATH-50096.20
AIME 202480.00
LiveCodeBench0.00
不开源
171
Grok 4 Fast
MMLU Pro0.00
GPQA Diamond85.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
不开源
172
GPT-5
MMLU Pro0.00
GPQA Diamond85.70
SWE-bench Verified72.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
173
DeepSeek V3.2
6710B
MMLU Pro0.00
GPQA Diamond82.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench83.30
Free commercial
174
GPT-5
MMLU Pro0.00
GPQA Diamond87.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
175
GPT-5.1
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified76.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
176
GPT-5.1
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
177
GPT-5-Pro
MMLU Pro0.00
GPQA Diamond88.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
178
Grok 4 Heavy
MMLU Pro0.00
GPQA Diamond88.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
179
GPT-5-Pro
MMLU Pro0.00
GPQA Diamond89.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
180
Gemini 3.0 Flash
MMLU Pro0.00
GPQA Diamond90.40
SWE-bench Verified68.70
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
181
GPT-5.2
MMLU Pro0.00
GPQA Diamond92.40
SWE-bench Verified80.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
182
GPT-5.2 Pro
MMLU Pro0.00
GPQA Diamond93.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
183
Gemini 3.0 Pro (Preview 11-2025)
MMLU Pro0.00
GPQA Diamond93.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
184
Amazon Nova Pro
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
185
OpenAI o3-mini
MMLU Pro0.00
GPQA Diamond70.60
SWE-bench Verified40.80
MATH-50095.80
AIME 202460.00
LiveCodeBench0.00
不开源
186
Qwen3-8B
80B
MMLU Pro0.00
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-50097.40
AIME 202476.00
LiveCodeBench57.50
Free commercial
187
GPT-4.1 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified23.60
MATH-5000.00
AIME 202449.60
LiveCodeBench0.00
不开源
188
Grok 3 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202440.00
LiveCodeBench0.00
不开源
189
DeepSeek-R1-Distill-Llama-70B
700B
MMLU Pro0.00
GPQA Diamond65.20
SWE-bench Verified0.00
MATH-50094.50
AIME 20240.00
LiveCodeBench0.00
Free commercial
190
Qwen3-4B-Thinking-2507
40B
MMLU Pro0.00
GPQA Diamond65.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.20
Free commercial
191
GLM-4.7-Flash
310B
MMLU Pro0.00
GPQA Diamond66.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
192
Gemini 2.5 Flash-Lite
MMLU Pro0.00
GPQA Diamond66.70
SWE-bench Verified27.60
MATH-5000.00
AIME 20240.00
LiveCodeBench34.30
不开源
193
Claude Sonnet 4
MMLU Pro0.00
GPQA Diamond68.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202443.40
LiveCodeBench48.50
不开源
194
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond68.00
SWE-bench Verified70.30
MATH-50082.20
AIME 202423.30
LiveCodeBench0.00
不开源
195
Magistral-Small-2506
240B
MMLU Pro0.00
GPQA Diamond68.18
SWE-bench Verified0.00
MATH-5000.00
AIME 202470.68
LiveCodeBench55.84
Free commercial
196
Qwen3-32B
320B
MMLU Pro0.00
GPQA Diamond68.40
SWE-bench Verified0.00
MATH-50097.20
AIME 202481.40
LiveCodeBench0.00
Free commercial
197
Qwen3-4B-2507
40B
MMLU Pro0.00
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.10
Free commercial
198
Magistral-Medium-2506
MMLU Pro0.00
GPQA Diamond70.83
SWE-bench Verified0.00
MATH-5000.00
AIME 202473.59
LiveCodeBench59.36
不开源
199
Qwen3-235B-A22B
2350B
MMLU Pro0.00
GPQA Diamond71.10
SWE-bench Verified0.00
MATH-50098.00
AIME 202485.70
LiveCodeBench70.70
Free commercial
200
Step3
3210B
MMLU Pro0.00
GPQA Diamond73.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.10
Free commercial
201
Claude Sonnet 4.5
MMLU Pro0.00
GPQA Diamond73.70
SWE-bench Verified64.80
MATH-5000.00
AIME 20240.00
LiveCodeBench59.00
不开源
202
GLM-4.7-Flash
310B
MMLU Pro0.00
GPQA Diamond75.20
SWE-bench Verified59.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
203
ERNIE-4.5-VL-424B-A47B-Base
4240B
MMLU Pro0.00
GPQA Diamond76.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench38.80
Free commercial
204
GPT-5
MMLU Pro0.00
GPQA Diamond77.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
205
Gemini 2.5 Flash
MMLU Pro0.00
GPQA Diamond78.30
SWE-bench Verified50.00
MATH-5000.00
AIME 202488.00
LiveCodeBench41.10
不开源
206
OpenAI o3-mini (high)
MMLU Pro0.00
GPQA Diamond79.70
SWE-bench Verified49.30
MATH-50097.90
AIME 202487.00
LiveCodeBench69.50
不开源
207
Grok 3
MMLU Pro0.00
GPQA Diamond80.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202484.20
LiveCodeBench70.60
不开源
208
Claude Opus 4.1
MMLU Pro0.00
GPQA Diamond80.90
SWE-bench Verified74.50
MATH-5000.00
AIME 20240.00
LiveCodeBench65.00
不开源