DataLearner 标志DataLearnerAI
AI技术博客
大模型评测排行
大模型评测基准
AI大模型大全
AI资源仓库
AI工具导航

加载中...

DataLearner 标志DataLearner AI

专注大模型评测、数据资源与实践教学的知识平台,持续更新可落地的 AI 能力图谱。

产品

  • 评测榜单
  • 模型对比
  • 数据资源

资源

  • 部署教程
  • 原创内容
  • 工具导航

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner 持续整合行业数据与案例,为科研、企业与开发者提供可靠的大模型情报与实践指南。

隐私政策服务条款

大模型评测基准与性能对比

对比大模型在 MMLU Pro、HLE、SWE-Bench 等评测上的表现,选择评测查看排名。

各个评测基准的详细介绍可见:LLM 评测基准列表与介绍

数据更新于: 2025/11/08 22:10:24

评测切换

在这里切换评测,图表和表格会同步更新

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

还有更多评测基准

进入评测基准列表,按类别/语言快速筛选

更多评测

筛选

已筛选
全部3B及以下7B13B34B65B100B及以上
全部推理大模型基座大模型指令优化/聊天优化大模型编程大模型

大模型性能评测结果

数据来源:DataLearnerAI
排名模型MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench参数(亿)开源情况
1OpenAI o191.0477.3048.9096.4079.2071.00—不开源
2Claude Opus 4.590.0087.0080.900.000.0087.00—不开源
3Claude Opus 4.188.0081.0079.400.000.0065.00—不开源
4Hunyuan-T187.2069.300.0096.2078.2064.90—不开源
5Grok 487.0087.0058.600.000.0082.00—不开源
6Gemini 2.5-Pro86.0086.4067.2098.8092.0077.10—不开源
7Qwen3-Max-Thinking85.7087.4075.300.000.0085.9010000B不开源
8OpenAI o385.6083.3069.1098.1091.6075.80—不开源
9Claude Opus 485.0079.6072.5098.2076.0056.60—不开源
10DeepSeek-R1-052885.0081.0057.6098.0091.4073.306710B免费商用
11DeepSeek V3.2-Exp85.0079.9067.800.000.0074.106710B免费商用
12Grok 4.1 Fast85.0085.000.000.000.0082.00—不开源
13GLM-4.584.6079.1064.2098.2091.0072.903550B免费商用
14Kimi K2 Thinking84.6084.5071.300.000.0083.1010400B免费商用
15Qwen3-235B-A22B-Thinking-250784.4081.100.000.000.0074.102350B免费商用
16Qwen3-235B-A22B-Thinking84.4081.100.000.000.0074.10305B免费商用
17DeepSeek-R184.0071.5049.2097.3079.8065.906710B免费商用
18Claude Sonnet 484.0083.8080.200.0043.4066.00—不开源
19GLM-4.5-Air81.4075.0057.6098.1089.4070.701060B免费商用
20MiniMax-M1-80k81.1070.0056.0096.8086.0065.004560B免费商用
21OpenAI o4 - mini80.6081.4068.100.0098.700.00—不开源
22MiniMax-M1-40k80.6069.2055.6096.0083.3062.304560B免费商用
23OpenAI o1-mini80.3060.000.0090.0063.6052.00—不开源
24Hunyuan-TurboS79.0057.500.000.000.0032.00—不开源
25GPT OSS 120B79.0080.1060.100.0096.600.00117B免费商用
26QwQ-32B76.0058.000.0091.0079.500.00325B免费商用
27GPT OSS 20B74.0071.5034.000.0096.000.00210B免费商用
28Qwen3-235B-A22B72.9071.1034.4098.0085.7070.702350B免费商用
29Qwen3-8B72.5062.000.0097.4079.4061.8080B免费商用
30QwQ-32B-Preview70.970.000.0090.6050.000.00320B免费商用
31Qwen3-30B-A3B69.1054.800.000.000.0029.00305B免费商用
32OpenAI o3-mini0.0070.6040.8095.8060.000.00—不开源
33QwQ-Max-Preview0.000.000.000.000.0065.60—免费商用
34Kimi-k1.6-IOI0.000.000.000.000.0065.90—不开源
35OpenAI o3-mini (medium)0.000.000.000.000.0067.40—不开源
36Kimi-k1.6-IOI-high0.000.000.000.000.0073.80—不开源
37Gemini 2.5 Pro Deep Think0.000.000.000.000.0080.40—不开源
38Kimi k1.5 (Short-CoT)0.000.000.0094.600.000.00—不开源
39Kimi k1.5 (Long-CoT)0.000.000.0096.200.000.00—不开源
40Grok 3.50.000.000.000.000.000.00—不开源
41Phi-4-instruct (reasoning-trained)0.0049.000.0090.4050.000.0038B不开源
42DeepSeek-R1-Distill-Qwen-7B0.0049.500.0091.4053.300.0070B免费商用
43DeepSeek-R1-Distill-Llama-70B0.0065.200.0094.500.000.00700B免费商用
44Gemini 2.5 Flash-Lite0.0066.7027.600.000.0034.30—不开源
45Magistral-Small-25060.0068.180.000.0070.6855.84240B免费商用
46Qwen3-32B0.0068.400.0097.2081.4065.70320B免费商用
47GPT-5.2 Pro0.0093.200.000.000.000.00—不开源
48Magistral-Medium-25060.0070.830.000.0073.5959.36—不开源
49GLM-4.7-Flash0.0075.2059.200.000.000.00310B免费商用
50OpenAI o3-mini (high)0.0079.7049.3097.9087.0069.50—不开源
51DeepSeek V3.20.0082.4073.100.000.0083.306710B免费商用
52Gemini 2.5 Flash0.0082.8050.000.0088.0055.40—不开源
53Gemini-2.5-Pro-Preview-05-060.0083.0063.2098.8092.0077.10—不开源
54o3-pro0.0084.0075.000.0093.000.00—不开源
55Gemini 2.5 Pro Experimental 03-250.0084.0063.800.0092.0070.40—不开源
56Grok-3 mini - Reasoning0.0084.000.000.0096.000.00—不开源
57Grok-3 - Reasoning Beta0.0084.600.000.0093.3079.40—不开源
58Claude Sonnet 3.7-64K Extended Thinking0.0084.800.0096.2080.000.00—不开源
59GPT-5.10.0088.1076.300.000.000.00—不开源
60GPT-5-Pro0.0089.400.000.000.000.00—不开源
1
OpenAI o1
MMLU Pro91.04
GPQA Diamond77.30
SWE-bench Verified48.90
MATH-50096.40
AIME 202479.20
LiveCodeBench71.00
不开源
2
Claude Opus 4.5
MMLU Pro90.00
GPQA Diamond87.00
SWE-bench Verified80.90
MATH-5000.00
AIME 20240.00
LiveCodeBench87.00
不开源
3
Claude Opus 4.1
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified79.40
MATH-5000.00
AIME 20240.00
LiveCodeBench65.00
不开源
4
Hunyuan-T1
MMLU Pro87.20
GPQA Diamond69.30
SWE-bench Verified0.00
MATH-50096.20
AIME 202478.20
LiveCodeBench64.90
不开源
5
Grok 4
MMLU Pro87.00
GPQA Diamond87.00
SWE-bench Verified58.60
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
不开源
6
Gemini 2.5-Pro
MMLU Pro86.00
GPQA Diamond86.40
SWE-bench Verified67.20
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
7
Qwen3-Max-Thinking
10000B
MMLU Pro85.70
GPQA Diamond87.40
SWE-bench Verified75.30
MATH-5000.00
AIME 20240.00
LiveCodeBench85.90
不开源
8
OpenAI o3
MMLU Pro85.60
GPQA Diamond83.30
SWE-bench Verified69.10
MATH-50098.10
AIME 202491.60
LiveCodeBench75.80
不开源
9
Claude Opus 4
MMLU Pro85.00
GPQA Diamond79.60
SWE-bench Verified72.50
MATH-50098.20
AIME 202476.00
LiveCodeBench56.60
不开源
10
DeepSeek-R1-0528
6710B
MMLU Pro85.00
GPQA Diamond81.00
SWE-bench Verified57.60
MATH-50098.00
AIME 202491.40
LiveCodeBench73.30
免费商用
11
DeepSeek V3.2-Exp
6710B
MMLU Pro85.00
GPQA Diamond79.90
SWE-bench Verified67.80
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
免费商用
12
Grok 4.1 Fast
MMLU Pro85.00
GPQA Diamond85.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
不开源
13
GLM-4.5
3550B
MMLU Pro84.60
GPQA Diamond79.10
SWE-bench Verified64.20
MATH-50098.20
AIME 202491.00
LiveCodeBench72.90
免费商用
14
Kimi K2 Thinking
10400B
MMLU Pro84.60
GPQA Diamond84.50
SWE-bench Verified71.30
MATH-5000.00
AIME 20240.00
LiveCodeBench83.10
免费商用
15
Qwen3-235B-A22B-Thinking-2507
2350B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
免费商用
16
Qwen3-235B-A22B-Thinking
305B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
免费商用
17
DeepSeek-R1
6710B
MMLU Pro84.00
GPQA Diamond71.50
SWE-bench Verified49.20
MATH-50097.30
AIME 202479.80
LiveCodeBench65.90
免费商用
18
Claude Sonnet 4
MMLU Pro84.00
GPQA Diamond83.80
SWE-bench Verified80.20
MATH-5000.00
AIME 202443.40
LiveCodeBench66.00
不开源
19
GLM-4.5-Air
1060B
MMLU Pro81.40
GPQA Diamond75.00
SWE-bench Verified57.60
MATH-50098.10
AIME 202489.40
LiveCodeBench70.70
免费商用
20
MiniMax-M1-80k
4560B
MMLU Pro81.10
GPQA Diamond70.00
SWE-bench Verified56.00
MATH-50096.80
AIME 202486.00
LiveCodeBench65.00
免费商用
21
OpenAI o4 - mini
MMLU Pro80.60
GPQA Diamond81.40
SWE-bench Verified68.10
MATH-5000.00
AIME 202498.70
LiveCodeBench0.00
不开源
22
MiniMax-M1-40k
4560B
MMLU Pro80.60
GPQA Diamond69.20
SWE-bench Verified55.60
MATH-50096.00
AIME 202483.30
LiveCodeBench62.30
免费商用
23
OpenAI o1-mini
MMLU Pro80.30
GPQA Diamond60.00
SWE-bench Verified0.00
MATH-50090.00
AIME 202463.60
LiveCodeBench52.00
不开源
24
Hunyuan-TurboS
MMLU Pro79.00
GPQA Diamond57.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.00
不开源
25
GPT OSS 120B
117B
MMLU Pro79.00
GPQA Diamond80.10
SWE-bench Verified60.10
MATH-5000.00
AIME 202496.60
LiveCodeBench0.00
免费商用
26
QwQ-32B
325B
MMLU Pro76.00
GPQA Diamond58.00
SWE-bench Verified0.00
MATH-50091.00
AIME 202479.50
LiveCodeBench0.00
免费商用
27
GPT OSS 20B
210B
MMLU Pro74.00
GPQA Diamond71.50
SWE-bench Verified34.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
免费商用
28
Qwen3-235B-A22B
2350B
MMLU Pro72.90
GPQA Diamond71.10
SWE-bench Verified34.40
MATH-50098.00
AIME 202485.70
LiveCodeBench70.70
免费商用
29
Qwen3-8B
80B
MMLU Pro72.50
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-50097.40
AIME 202479.40
LiveCodeBench61.80
免费商用
30
QwQ-32B-Preview
320B
MMLU Pro70.97
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50090.60
AIME 202450.00
LiveCodeBench0.00
免费商用
31
Qwen3-30B-A3B
305B
MMLU Pro69.10
GPQA Diamond54.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench29.00
免费商用
32
OpenAI o3-mini
MMLU Pro0.00
GPQA Diamond70.60
SWE-bench Verified40.80
MATH-50095.80
AIME 202460.00
LiveCodeBench0.00
不开源
33
QwQ-Max-Preview
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.60
免费商用
34
Kimi-k1.6-IOI
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.90
不开源
35
OpenAI o3-mini (medium)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.40
不开源
36
Kimi-k1.6-IOI-high
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench73.80
不开源
37
Gemini 2.5 Pro Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.40
不开源
38
Kimi k1.5 (Short-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50094.60
AIME 20240.00
LiveCodeBench0.00
不开源
39
Kimi k1.5 (Long-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.20
AIME 20240.00
LiveCodeBench0.00
不开源
40
Grok 3.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
41
Phi-4-instruct (reasoning-trained)
38B
MMLU Pro0.00
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-50090.40
AIME 202450.00
LiveCodeBench0.00
不开源
42
DeepSeek-R1-Distill-Qwen-7B
70B
MMLU Pro0.00
GPQA Diamond49.50
SWE-bench Verified0.00
MATH-50091.40
AIME 202453.30
LiveCodeBench0.00
免费商用
43
DeepSeek-R1-Distill-Llama-70B
700B
MMLU Pro0.00
GPQA Diamond65.20
SWE-bench Verified0.00
MATH-50094.50
AIME 20240.00
LiveCodeBench0.00
免费商用
44
Gemini 2.5 Flash-Lite
MMLU Pro0.00
GPQA Diamond66.70
SWE-bench Verified27.60
MATH-5000.00
AIME 20240.00
LiveCodeBench34.30
不开源
45
Magistral-Small-2506
240B
MMLU Pro0.00
GPQA Diamond68.18
SWE-bench Verified0.00
MATH-5000.00
AIME 202470.68
LiveCodeBench55.84
免费商用
46
Qwen3-32B
320B
MMLU Pro0.00
GPQA Diamond68.40
SWE-bench Verified0.00
MATH-50097.20
AIME 202481.40
LiveCodeBench65.70
免费商用
47
GPT-5.2 Pro
MMLU Pro0.00
GPQA Diamond93.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
48
Magistral-Medium-2506
MMLU Pro0.00
GPQA Diamond70.83
SWE-bench Verified0.00
MATH-5000.00
AIME 202473.59
LiveCodeBench59.36
不开源
49
GLM-4.7-Flash
310B
MMLU Pro0.00
GPQA Diamond75.20
SWE-bench Verified59.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
50
OpenAI o3-mini (high)
MMLU Pro0.00
GPQA Diamond79.70
SWE-bench Verified49.30
MATH-50097.90
AIME 202487.00
LiveCodeBench69.50
不开源
51
DeepSeek V3.2
6710B
MMLU Pro0.00
GPQA Diamond82.40
SWE-bench Verified73.10
MATH-5000.00
AIME 20240.00
LiveCodeBench83.30
免费商用
52
Gemini 2.5 Flash
MMLU Pro0.00
GPQA Diamond82.80
SWE-bench Verified50.00
MATH-5000.00
AIME 202488.00
LiveCodeBench55.40
不开源
53
Gemini-2.5-Pro-Preview-05-06
MMLU Pro0.00
GPQA Diamond83.00
SWE-bench Verified63.20
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
54
o3-pro
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified75.00
MATH-5000.00
AIME 202493.00
LiveCodeBench0.00
不开源
55
Gemini 2.5 Pro Experimental 03-25
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified63.80
MATH-5000.00
AIME 202492.00
LiveCodeBench70.40
不开源
56
Grok-3 mini - Reasoning
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
不开源
57
Grok-3 - Reasoning Beta
MMLU Pro0.00
GPQA Diamond84.60
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.30
LiveCodeBench79.40
不开源
58
Claude Sonnet 3.7-64K Extended Thinking
MMLU Pro0.00
GPQA Diamond84.80
SWE-bench Verified0.00
MATH-50096.20
AIME 202480.00
LiveCodeBench0.00
不开源
59
GPT-5.1
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified76.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
60
GPT-5-Pro
MMLU Pro0.00
GPQA Diamond89.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源