DataLearner 标志DataLearnerAI
AI技术博客
大模型评测排行
大模型评测基准
AI大模型大全
AI资源仓库
AI工具导航

加载中...

DataLearner 标志DataLearner AI

专注大模型评测、数据资源与实践教学的知识平台,持续更新可落地的 AI 能力图谱。

产品

  • 评测榜单
  • 模型对比
  • 数据资源

资源

  • 部署教程
  • 原创内容
  • 工具导航

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner 持续整合行业数据与案例,为科研、企业与开发者提供可靠的大模型情报与实践指南。

隐私政策服务条款

大模型评测基准与性能对比

对比大模型在 MMLU Pro、HLE、SWE-Bench 等评测上的表现,选择评测查看排名。

各个评测基准的详细介绍可见:LLM 评测基准列表与介绍

数据更新于: 2025/11/08 22:10:24

评测切换

在这里切换评测,图表和表格会同步更新

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

还有更多评测基准

进入评测基准列表,按类别/语言快速筛选

更多评测

筛选

已筛选
全部3B及以下

大模型性能评测结果

数据来源:DataLearnerAI
排名模型MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench参数(亿)开源情况
1OpenAI o191.0477.3048.9096.4079.2071.00—不开源
2M2.188.0081.0074.800.00
7B
13B
34B
65B
100B及以上
全部推理大模型基座大模型指令优化/聊天优化大模型编程大模型
0.00
0.00
2300B
免费商用
3GPT-4.586.1071.4038.0090.7036.7046.40—不开源
4Qwen3-Max-Thinking85.7087.4075.300.000.0085.9010000B不开源
5DeepSeek-R1-052885.0081.0057.6098.0091.4073.306710B免费商用
6DeepSeek-V3.185.0080.1066.000.0093.1074.806710B免费商用
7DeepSeek-V3.1 Terminus85.0080.7068.400.000.0080.006710B免费商用
8DeepSeek V3.2-Exp85.0079.9067.800.000.0074.106710B免费商用
9Claude Opus 485.0079.6072.5098.2076.0056.60—不开源
10GLM-4.584.6079.1064.2098.2091.0072.903550B免费商用
11Kimi K2 Thinking84.6084.5071.300.000.0083.1010400B免费商用
12Qwen3-235B-A22B-Thinking-250784.4081.100.000.000.0074.102350B免费商用
13GLM-4.784.3085.7073.800.000.0084.903580B免费商用
14DeepSeek-R184.0071.5049.2097.3079.8065.906710B免费商用
15Intern-S183.5077.300.000.000.000.002410B免费商用
16Qwen3-235B-A22B-250783.0077.500.000.000.0051.802350B免费商用
17GLM-4.683.0082.9068.000.000.0084.503550B免费商用
18Llama 4 Behemoth Instruct82.2073.700.0095.000.0049.4020000B免费商用
19MiniMax M282.0078.0069.400.000.0083.002300B免费商用
20GLM-4.5-Air81.4075.0057.6098.1089.4070.701060B免费商用
21DeepSeek-V3-032481.2068.4038.8094.0059.4049.206710B免费商用
22Kimi K281.1075.1051.8097.4069.6053.7010000B免费商用
23MiniMax-M1-80k81.1070.0056.0096.8086.0065.004560B免费商用
24OpenAI o4 - mini80.6081.4068.100.0098.700.00—不开源
25MiniMax-M1-40k80.6069.2055.6096.0083.3062.304560B免费商用
26Llama 4 Maverick Instruct80.5069.800.000.000.0043.404000B免费商用
27GPT-4.180.5066.3054.6092.8048.1040.50—不开源
28OpenAI o1-mini80.3060.000.0090.0063.6052.00—不开源
29Gemini 2.0 Pro Experimental79.1064.700.000.0036.000.00—不开源
30Hunyuan-TurboS79.0057.500.000.000.0032.00—不开源
31Kimi K2.578.5087.6076.800.000.0085.0010000B免费商用
32ERNIE-4.5-300B-A47B78.400.000.0096.4054.8038.803000B免费商用
33GPT-4o(2024-11-20)77.900.000.000.000.000.00—不开源
34Claude 3.5 Sonnet77.6459.400.000.000.000.00—不开源
35Gemini 2.0 Flash Experimental76.2465.2021.400.000.0029.10—不开源
36Qwen2.5-Max76.100.000.000.000.000.00—不开源
37DeepSeek-V375.9059.100.0087.8039.0034.606810B免费商用
38Grok 275.5056.000.000.000.000.002690B免费商用
39Llama 4 Scout Instruct74.3057.200.000.000.0032.801090B免费商用
40Llama3.1-405B Instruct73.4049.000.000.000.0030.204050B免费商用
41Qwen3-235B-A22B72.9071.1034.4098.0085.7070.702350B免费商用
42Gemini 2.0 Flash-Lite71.6051.500.000.000.0028.90—不开源
43Llama 4 Maverick62.900.000.000.000.000.004000B免费商用
44Llama3.1-405B61.600.000.000.000.000.004050B免费商用
45Llama 4 Scout58.200.000.000.000.000.001090B免费商用
46Mixtral-8x22B-Instruct-v0.156.330.000.000.000.000.001410B免费商用
47Grok-1.551.0035.900.000.000.000.00—不开源
48Grok 3 mini0.0065.000.000.0040.000.00—不开源
49Codestral 25.010.000.000.000.000.0037.90—不开源
50GPT-4.1 mini0.0065.0023.600.0049.600.00—不开源
51GPT-4.1 nano0.0050.300.000.0029.400.00—不开源
52Grok 3.50.000.000.000.000.000.00—不开源
53Step 3.5 Flash0.000.0074.400.000.0086.401960B免费商用
54Kimi K2 09050.000.0069.200.000.000.0010000B免费商用
55Qwen3-Coder-480B-A35B0.000.0067.000.000.000.004800B免费商用
56Kimi k1.5 (Long-CoT)0.000.000.0096.200.000.00—不开源
57Kimi k1.5 (Short-CoT)0.000.000.0094.600.000.00—不开源
58Gemini 2.5 Pro Deep Think0.000.000.000.000.0080.40—不开源
59Kimi-k1.6-IOI-high0.000.000.000.000.0073.80—不开源
60OpenAI o3-mini (medium)0.000.000.000.000.0067.40—不开源
61Kimi-k1.6-IOI0.000.000.000.000.0065.90—不开源
62QwQ-Max-Preview0.000.000.000.000.0065.60—免费商用
63Gemini 2.5 Flash-Lite0.0066.7027.600.000.0034.30—不开源
64Claude Sonnet 3.70.0068.0070.3082.2023.300.00—不开源
65Magistral-Medium-25060.0070.830.000.0073.5959.36—不开源
66Step30.0073.000.000.000.0067.103210B免费商用
67ERNIE-4.5-VL-424B-A47B-Base0.0076.800.000.000.0038.804240B免费商用
68OpenAI o3-mini (high)0.0079.7049.3097.9087.0069.50—不开源
69Grok 30.0080.400.000.0084.2070.60—不开源
70DeepSeek V3.20.0082.4073.100.000.0083.306710B免费商用
71Gemini 2.5 Flash0.0082.8050.000.0088.0055.40—不开源
72Gemini-2.5-Pro-Preview-05-060.0083.0063.2098.8092.0077.10—不开源
73o3-pro0.0084.0075.000.0093.000.00—不开源
74Grok-3 mini - Reasoning0.0084.000.000.0096.000.00—不开源
75Grok-3 - Reasoning Beta0.0084.600.000.0093.3079.40—不开源
76Claude Sonnet 3.7-64K Extended Thinking0.0084.800.0096.2080.000.00—不开源
77Amazon Nova Pro0.000.000.000.000.000.00—不开源
1
OpenAI o1
MMLU Pro91.04
GPQA Diamond77.30
SWE-bench Verified48.90
MATH-50096.40
AIME 202479.20
LiveCodeBench71.00
不开源
2
M2.1
2300B
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified74.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
3
GPT-4.5
MMLU Pro86.10
GPQA Diamond71.40
SWE-bench Verified38.00
MATH-50090.70
AIME 202436.70
LiveCodeBench46.40
不开源
4
Qwen3-Max-Thinking
10000B
MMLU Pro85.70
GPQA Diamond87.40
SWE-bench Verified75.30
MATH-5000.00
AIME 20240.00
LiveCodeBench85.90
不开源
5
DeepSeek-R1-0528
6710B
MMLU Pro85.00
GPQA Diamond81.00
SWE-bench Verified57.60
MATH-50098.00
AIME 202491.40
LiveCodeBench73.30
免费商用
6
DeepSeek-V3.1
6710B
MMLU Pro85.00
GPQA Diamond80.10
SWE-bench Verified66.00
MATH-5000.00
AIME 202493.10
LiveCodeBench74.80
免费商用
7
DeepSeek-V3.1 Terminus
6710B
MMLU Pro85.00
GPQA Diamond80.70
SWE-bench Verified68.40
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
免费商用
8
DeepSeek V3.2-Exp
6710B
MMLU Pro85.00
GPQA Diamond79.90
SWE-bench Verified67.80
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
免费商用
9
Claude Opus 4
MMLU Pro85.00
GPQA Diamond79.60
SWE-bench Verified72.50
MATH-50098.20
AIME 202476.00
LiveCodeBench56.60
不开源
10
GLM-4.5
3550B
MMLU Pro84.60
GPQA Diamond79.10
SWE-bench Verified64.20
MATH-50098.20
AIME 202491.00
LiveCodeBench72.90
免费商用
11
Kimi K2 Thinking
10400B
MMLU Pro84.60
GPQA Diamond84.50
SWE-bench Verified71.30
MATH-5000.00
AIME 20240.00
LiveCodeBench83.10
免费商用
12
Qwen3-235B-A22B-Thinking-2507
2350B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
免费商用
13
GLM-4.7
3580B
MMLU Pro84.30
GPQA Diamond85.70
SWE-bench Verified73.80
MATH-5000.00
AIME 20240.00
LiveCodeBench84.90
免费商用
14
DeepSeek-R1
6710B
MMLU Pro84.00
GPQA Diamond71.50
SWE-bench Verified49.20
MATH-50097.30
AIME 202479.80
LiveCodeBench65.90
免费商用
15
Intern-S1
2410B
MMLU Pro83.50
GPQA Diamond77.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
16
Qwen3-235B-A22B-2507
2350B
MMLU Pro83.00
GPQA Diamond77.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.80
免费商用
17
GLM-4.6
3550B
MMLU Pro83.00
GPQA Diamond82.90
SWE-bench Verified68.00
MATH-5000.00
AIME 20240.00
LiveCodeBench84.50
免费商用
18
Llama 4 Behemoth Instruct
20000B
MMLU Pro82.20
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50095.00
AIME 20240.00
LiveCodeBench49.40
免费商用
19
MiniMax M2
2300B
MMLU Pro82.00
GPQA Diamond78.00
SWE-bench Verified69.40
MATH-5000.00
AIME 20240.00
LiveCodeBench83.00
免费商用
20
GLM-4.5-Air
1060B
MMLU Pro81.40
GPQA Diamond75.00
SWE-bench Verified57.60
MATH-50098.10
AIME 202489.40
LiveCodeBench70.70
免费商用
21
DeepSeek-V3-0324
6710B
MMLU Pro81.20
GPQA Diamond68.40
SWE-bench Verified38.80
MATH-50094.00
AIME 202459.40
LiveCodeBench49.20
免费商用
22
Kimi K2
10000B
MMLU Pro81.10
GPQA Diamond75.10
SWE-bench Verified51.80
MATH-50097.40
AIME 202469.60
LiveCodeBench53.70
免费商用
23
MiniMax-M1-80k
4560B
MMLU Pro81.10
GPQA Diamond70.00
SWE-bench Verified56.00
MATH-50096.80
AIME 202486.00
LiveCodeBench65.00
免费商用
24
OpenAI o4 - mini
MMLU Pro80.60
GPQA Diamond81.40
SWE-bench Verified68.10
MATH-5000.00
AIME 202498.70
LiveCodeBench0.00
不开源
25
MiniMax-M1-40k
4560B
MMLU Pro80.60
GPQA Diamond69.20
SWE-bench Verified55.60
MATH-50096.00
AIME 202483.30
LiveCodeBench62.30
免费商用
26
Llama 4 Maverick Instruct
4000B
MMLU Pro80.50
GPQA Diamond69.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.40
免费商用
27
GPT-4.1
MMLU Pro80.50
GPQA Diamond66.30
SWE-bench Verified54.60
MATH-50092.80
AIME 202448.10
LiveCodeBench40.50
不开源
28
OpenAI o1-mini
MMLU Pro80.30
GPQA Diamond60.00
SWE-bench Verified0.00
MATH-50090.00
AIME 202463.60
LiveCodeBench52.00
不开源
29
Gemini 2.0 Pro Experimental
MMLU Pro79.10
GPQA Diamond64.70
SWE-bench Verified0.00
MATH-5000.00
AIME 202436.00
LiveCodeBench0.00
不开源
30
Hunyuan-TurboS
MMLU Pro79.00
GPQA Diamond57.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.00
不开源
31
Kimi K2.5
10000B
MMLU Pro78.50
GPQA Diamond87.60
SWE-bench Verified76.80
MATH-5000.00
AIME 20240.00
LiveCodeBench85.00
免费商用
32
ERNIE-4.5-300B-A47B
3000B
MMLU Pro78.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.40
AIME 202454.80
LiveCodeBench38.80
免费商用
33
GPT-4o(2024-11-20)
MMLU Pro77.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
34
Claude 3.5 Sonnet
MMLU Pro77.64
GPQA Diamond59.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
35
Gemini 2.0 Flash Experimental
MMLU Pro76.24
GPQA Diamond65.20
SWE-bench Verified21.40
MATH-5000.00
AIME 20240.00
LiveCodeBench29.10
不开源
36
Qwen2.5-Max
MMLU Pro76.10
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
37
DeepSeek-V3
6810B
MMLU Pro75.90
GPQA Diamond59.10
SWE-bench Verified0.00
MATH-50087.80
AIME 202439.00
LiveCodeBench34.60
免费商用
38
Grok 2
2690B
MMLU Pro75.50
GPQA Diamond56.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
39
Llama 4 Scout Instruct
1090B
MMLU Pro74.30
GPQA Diamond57.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.80
免费商用
40
Llama3.1-405B Instruct
4050B
MMLU Pro73.40
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench30.20
免费商用
41
Qwen3-235B-A22B
2350B
MMLU Pro72.90
GPQA Diamond71.10
SWE-bench Verified34.40
MATH-50098.00
AIME 202485.70
LiveCodeBench70.70
免费商用
42
Gemini 2.0 Flash-Lite
MMLU Pro71.60
GPQA Diamond51.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench28.90
不开源
43
Llama 4 Maverick
4000B
MMLU Pro62.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
44
Llama3.1-405B
4050B
MMLU Pro61.60
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
45
Llama 4 Scout
1090B
MMLU Pro58.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
46
Mixtral-8x22B-Instruct-v0.1
1410B
MMLU Pro56.33
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
47
Grok-1.5
MMLU Pro51.00
GPQA Diamond35.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
48
Grok 3 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202440.00
LiveCodeBench0.00
不开源
49
Codestral 25.01
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench37.90
不开源
50
GPT-4.1 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified23.60
MATH-5000.00
AIME 202449.60
LiveCodeBench0.00
不开源
51
GPT-4.1 nano
MMLU Pro0.00
GPQA Diamond50.30
SWE-bench Verified0.00
MATH-5000.00
AIME 202429.40
LiveCodeBench0.00
不开源
52
Grok 3.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
53
Step 3.5 Flash
1960B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified74.40
MATH-5000.00
AIME 20240.00
LiveCodeBench86.40
免费商用
54
Kimi K2 0905
10000B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
55
Qwen3-Coder-480B-A35B
4800B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified67.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
56
Kimi k1.5 (Long-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.20
AIME 20240.00
LiveCodeBench0.00
不开源
57
Kimi k1.5 (Short-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50094.60
AIME 20240.00
LiveCodeBench0.00
不开源
58
Gemini 2.5 Pro Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.40
不开源
59
Kimi-k1.6-IOI-high
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench73.80
不开源
60
OpenAI o3-mini (medium)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.40
不开源
61
Kimi-k1.6-IOI
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.90
不开源
62
QwQ-Max-Preview
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.60
免费商用
63
Gemini 2.5 Flash-Lite
MMLU Pro0.00
GPQA Diamond66.70
SWE-bench Verified27.60
MATH-5000.00
AIME 20240.00
LiveCodeBench34.30
不开源
64
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond68.00
SWE-bench Verified70.30
MATH-50082.20
AIME 202423.30
LiveCodeBench0.00
不开源
65
Magistral-Medium-2506
MMLU Pro0.00
GPQA Diamond70.83
SWE-bench Verified0.00
MATH-5000.00
AIME 202473.59
LiveCodeBench59.36
不开源
66
Step3
3210B
MMLU Pro0.00
GPQA Diamond73.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.10
免费商用
67
ERNIE-4.5-VL-424B-A47B-Base
4240B
MMLU Pro0.00
GPQA Diamond76.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench38.80
免费商用
68
OpenAI o3-mini (high)
MMLU Pro0.00
GPQA Diamond79.70
SWE-bench Verified49.30
MATH-50097.90
AIME 202487.00
LiveCodeBench69.50
不开源
69
Grok 3
MMLU Pro0.00
GPQA Diamond80.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202484.20
LiveCodeBench70.60
不开源
70
DeepSeek V3.2
6710B
MMLU Pro0.00
GPQA Diamond82.40
SWE-bench Verified73.10
MATH-5000.00
AIME 20240.00
LiveCodeBench83.30
免费商用
71
Gemini 2.5 Flash
MMLU Pro0.00
GPQA Diamond82.80
SWE-bench Verified50.00
MATH-5000.00
AIME 202488.00
LiveCodeBench55.40
不开源
72
Gemini-2.5-Pro-Preview-05-06
MMLU Pro0.00
GPQA Diamond83.00
SWE-bench Verified63.20
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
73
o3-pro
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified75.00
MATH-5000.00
AIME 202493.00
LiveCodeBench0.00
不开源
74
Grok-3 mini - Reasoning
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
不开源
75
Grok-3 - Reasoning Beta
MMLU Pro0.00
GPQA Diamond84.60
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.30
LiveCodeBench79.40
不开源
76
Claude Sonnet 3.7-64K Extended Thinking
MMLU Pro0.00
GPQA Diamond84.80
SWE-bench Verified0.00
MATH-50096.20
AIME 202480.00
LiveCodeBench0.00
不开源
77
Amazon Nova Pro
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源