DataLearner 标志DataLearnerAI
AI技术博客
大模型评测排行
大模型评测基准
AI大模型大全
AI资源仓库
AI工具导航

加载中...

DataLearner 标志DataLearner AI

专注大模型评测、数据资源与实践教学的知识平台,持续更新可落地的 AI 能力图谱。

产品

  • 评测榜单
  • 模型对比
  • 数据资源

资源

  • 部署教程
  • 原创内容
  • 工具导航

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner 持续整合行业数据与案例,为科研、企业与开发者提供可靠的大模型情报与实践指南。

隐私政策服务条款

大模型评测基准与性能对比

对比大模型在 MMLU Pro、HLE、SWE-Bench 等评测上的表现,选择评测查看排名。

各个评测基准的详细介绍可见:LLM 评测基准列表与介绍

数据更新于: 2025/11/08 22:10:24

评测切换

在这里切换评测,图表和表格会同步更新

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

还有更多评测基准

进入评测基准列表,按类别/语言快速筛选

更多评测

筛选

已筛选
全部3B及以下7B13B34B65B100B及以上
全部推理大模型基座大模型指令优化/聊天优化大模型编程大模型

大模型性能评测结果

数据来源:DataLearnerAI
排名模型MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench参数(亿)开源情况
1M2.188.0081.0074.800.000.000.002300B免费商用
2Claude Sonnet 4.588.0083.4082.000.000.0071.00—不开源
3GPT-4.586.1071.4038.0090.7036.7046.40—不开源
4DeepSeek-V3.185.0080.1066.000.0093.1074.806710B免费商用
5DeepSeek-V3.1 Terminus85.0080.7068.400.000.0080.006710B免费商用
6GLM-4.784.3085.7073.800.000.0084.903580B免费商用
7Qwen3 Max (Preview)84.0076.0069.600.000.0057.50—不开源
8Qwen3-235B-A22B-250783.0077.500.000.000.0051.802350B免费商用
9GLM-4.683.0082.9068.000.000.0084.503550B免费商用
10Pangu Pro MoE82.6073.700.0096.8079.2059.60719B免费商用
11MiniMax M282.0078.0069.400.000.0083.002300B免费商用
12DeepSeek-V3-032481.2068.4038.8094.0059.4049.206710B免费商用
13Kimi K281.1075.1051.8097.4069.6053.7010000B免费商用
14GPT-4.180.5066.3054.6092.8048.1040.50—不开源
15GPT-4o(2025-03-27)79.8066.900.000.000.0035.80—不开源
16Gemini 2.0 Pro Experimental79.1064.700.000.0036.000.00—不开源
17Pangu Embedded79.000.000.0092.4081.9067.1070B免费商用
18ERNIE-4.5-300B-A47B78.400.000.0096.4054.8038.803000B免费商用
19Qwen3-30B-A3B-250778.4070.4022.000.000.0043.20305B免费商用
20Claude 3.5 Sonnet New78.0065.0049.0078.0016.0038.70—不开源
21GPT-4o(2024-11-20)77.900.000.000.000.000.00—不开源
22Qwen2.5-Max76.100.000.000.000.000.00—不开源
23DeepSeek-V375.9059.100.0087.8039.0034.606810B免费商用
24Grok 275.5056.000.000.000.000.002690B免费商用
25GLM-4-9B-Chat72.400.000.000.0076.4051.8090B免费商用
26Gemini 2.0 Flash-Lite71.6051.500.000.000.0028.90—不开源
27Mistral-Small-3.269.0646.130.000.000.000.00240B免费商用
28Llama3.3-70B-Instruct68.9050.500.000.000.0033.30700B免费商用
29Gemma 3 - 27B (IT)67.5042.400.000.0025.3029.70270B免费商用
30Qwen3-Next66.050.000.000.000.0056.60800B免费商用
31Mixtral-8x22B-Instruct-v0.156.330.000.000.000.000.001410B免费商用
32Llama3-70B-Instruct56.200.000.000.000.000.00700B免费商用
33Phi-4-mini-instruct (3.8B)52.8036.000.0071.8010.000.0038B免费商用
34Llama3-70B52.780.000.000.000.000.00700B免费商用
35Grok-1.551.0035.900.000.000.000.00—不开源
36Llama3.1-8B-Instruct44.0026.300.000.000.000.0080B免费商用
37Moonlight-16B-A3B-Instruct42.400.000.000.000.000.00160B免费商用
38Mistral-7B-Instruct-v0.330.9024.700.000.000.000.0070B免费商用
39Gemini 2.5 Deep Think0.000.000.000.000.0087.60—不开源
40Gemini 2.5 Flash-Preview-09-20250.000.0054.000.000.000.00—不开源
41Kimi K2 09050.000.0069.200.000.000.0010000B免费商用
42Step 3.5 Flash0.000.0074.400.000.0086.401960B免费商用
43GPT-4.1 nano0.0050.300.000.0029.400.00—不开源
44Hunyuan-7B0.0060.100.0093.7081.1057.0070B免费商用
45Qwen3-4B-25070.0062.000.000.000.0035.1040B免费商用
46GPT-4.1 mini0.0065.0023.600.0049.600.00—不开源
47Qwen3-4B-Thinking-25070.0065.800.000.000.0055.2040B免费商用
48Claude Sonnet 3.70.0068.0070.3082.2023.300.00—不开源
49Grok 30.0080.400.000.0084.2070.60—不开源
50Grok 4 Fast0.0085.700.000.000.0080.00—不开源
51Grok 4 Heavy0.0088.9073.500.000.000.00—不开源
52Gemini 3.0 Flash0.0090.4068.700.000.000.00—不开源
53GPT-5.20.0092.4080.000.000.000.00—不开源
1
M2.1
2300B
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified74.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
2
Claude Sonnet 4.5
MMLU Pro88.00
GPQA Diamond83.40
SWE-bench Verified82.00
MATH-5000.00
AIME 20240.00
LiveCodeBench71.00
不开源
3
GPT-4.5
MMLU Pro86.10
GPQA Diamond71.40
SWE-bench Verified38.00
MATH-50090.70
AIME 202436.70
LiveCodeBench46.40
不开源
4
DeepSeek-V3.1
6710B
MMLU Pro85.00
GPQA Diamond80.10
SWE-bench Verified66.00
MATH-5000.00
AIME 202493.10
LiveCodeBench74.80
免费商用
5
DeepSeek-V3.1 Terminus
6710B
MMLU Pro85.00
GPQA Diamond80.70
SWE-bench Verified68.40
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
免费商用
6
GLM-4.7
3580B
MMLU Pro84.30
GPQA Diamond85.70
SWE-bench Verified73.80
MATH-5000.00
AIME 20240.00
LiveCodeBench84.90
免费商用
7
Qwen3 Max (Preview)
MMLU Pro84.00
GPQA Diamond76.00
SWE-bench Verified69.60
MATH-5000.00
AIME 20240.00
LiveCodeBench57.50
不开源
8
Qwen3-235B-A22B-2507
2350B
MMLU Pro83.00
GPQA Diamond77.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.80
免费商用
9
GLM-4.6
3550B
MMLU Pro83.00
GPQA Diamond82.90
SWE-bench Verified68.00
MATH-5000.00
AIME 20240.00
LiveCodeBench84.50
免费商用
10
Pangu Pro MoE
719B
MMLU Pro82.60
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50096.80
AIME 202479.20
LiveCodeBench59.60
免费商用
11
MiniMax M2
2300B
MMLU Pro82.00
GPQA Diamond78.00
SWE-bench Verified69.40
MATH-5000.00
AIME 20240.00
LiveCodeBench83.00
免费商用
12
DeepSeek-V3-0324
6710B
MMLU Pro81.20
GPQA Diamond68.40
SWE-bench Verified38.80
MATH-50094.00
AIME 202459.40
LiveCodeBench49.20
免费商用
13
Kimi K2
10000B
MMLU Pro81.10
GPQA Diamond75.10
SWE-bench Verified51.80
MATH-50097.40
AIME 202469.60
LiveCodeBench53.70
免费商用
14
GPT-4.1
MMLU Pro80.50
GPQA Diamond66.30
SWE-bench Verified54.60
MATH-50092.80
AIME 202448.10
LiveCodeBench40.50
不开源
15
GPT-4o(2025-03-27)
MMLU Pro79.80
GPQA Diamond66.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.80
不开源
16
Gemini 2.0 Pro Experimental
MMLU Pro79.10
GPQA Diamond64.70
SWE-bench Verified0.00
MATH-5000.00
AIME 202436.00
LiveCodeBench0.00
不开源
17
Pangu Embedded
70B
MMLU Pro79.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50092.40
AIME 202481.90
LiveCodeBench67.10
免费商用
18
ERNIE-4.5-300B-A47B
3000B
MMLU Pro78.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.40
AIME 202454.80
LiveCodeBench38.80
免费商用
19
Qwen3-30B-A3B-2507
305B
MMLU Pro78.40
GPQA Diamond70.40
SWE-bench Verified22.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.20
免费商用
20
Claude 3.5 Sonnet New
MMLU Pro78.00
GPQA Diamond65.00
SWE-bench Verified49.00
MATH-50078.00
AIME 202416.00
LiveCodeBench38.70
不开源
21
GPT-4o(2024-11-20)
MMLU Pro77.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
22
Qwen2.5-Max
MMLU Pro76.10
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
23
DeepSeek-V3
6810B
MMLU Pro75.90
GPQA Diamond59.10
SWE-bench Verified0.00
MATH-50087.80
AIME 202439.00
LiveCodeBench34.60
免费商用
24
Grok 2
2690B
MMLU Pro75.50
GPQA Diamond56.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
25
GLM-4-9B-Chat
90B
MMLU Pro72.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202476.40
LiveCodeBench51.80
免费商用
26
Gemini 2.0 Flash-Lite
MMLU Pro71.60
GPQA Diamond51.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench28.90
不开源
27
Mistral-Small-3.2
240B
MMLU Pro69.06
GPQA Diamond46.13
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
28
Llama3.3-70B-Instruct
700B
MMLU Pro68.90
GPQA Diamond50.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
免费商用
29
Gemma 3 - 27B (IT)
270B
MMLU Pro67.50
GPQA Diamond42.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202425.30
LiveCodeBench29.70
免费商用
30
Qwen3-Next
800B
MMLU Pro66.05
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench56.60
免费商用
31
Mixtral-8x22B-Instruct-v0.1
1410B
MMLU Pro56.33
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
32
Llama3-70B-Instruct
700B
MMLU Pro56.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
33
Phi-4-mini-instruct (3.8B)
38B
MMLU Pro52.80
GPQA Diamond36.00
SWE-bench Verified0.00
MATH-50071.80
AIME 202410.00
LiveCodeBench0.00
免费商用
34
Llama3-70B
700B
MMLU Pro52.78
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
35
Grok-1.5
MMLU Pro51.00
GPQA Diamond35.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
36
Llama3.1-8B-Instruct
80B
MMLU Pro44.00
GPQA Diamond26.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
37
Moonlight-16B-A3B-Instruct
160B
MMLU Pro42.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
38
Mistral-7B-Instruct-v0.3
70B
MMLU Pro30.90
GPQA Diamond24.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
39
Gemini 2.5 Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench87.60
不开源
40
Gemini 2.5 Flash-Preview-09-2025
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified54.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
41
Kimi K2 0905
10000B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
免费商用
42
Step 3.5 Flash
1960B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified74.40
MATH-5000.00
AIME 20240.00
LiveCodeBench86.40
免费商用
43
GPT-4.1 nano
MMLU Pro0.00
GPQA Diamond50.30
SWE-bench Verified0.00
MATH-5000.00
AIME 202429.40
LiveCodeBench0.00
不开源
44
Hunyuan-7B
70B
MMLU Pro0.00
GPQA Diamond60.10
SWE-bench Verified0.00
MATH-50093.70
AIME 202481.10
LiveCodeBench57.00
免费商用
45
Qwen3-4B-2507
40B
MMLU Pro0.00
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.10
免费商用
46
GPT-4.1 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified23.60
MATH-5000.00
AIME 202449.60
LiveCodeBench0.00
不开源
47
Qwen3-4B-Thinking-2507
40B
MMLU Pro0.00
GPQA Diamond65.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.20
免费商用
48
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond68.00
SWE-bench Verified70.30
MATH-50082.20
AIME 202423.30
LiveCodeBench0.00
不开源
49
Grok 3
MMLU Pro0.00
GPQA Diamond80.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202484.20
LiveCodeBench70.60
不开源
50
Grok 4 Fast
MMLU Pro0.00
GPQA Diamond85.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
不开源
51
Grok 4 Heavy
MMLU Pro0.00
GPQA Diamond88.90
SWE-bench Verified73.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
52
Gemini 3.0 Flash
MMLU Pro0.00
GPQA Diamond90.40
SWE-bench Verified68.70
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
53
GPT-5.2
MMLU Pro0.00
GPQA Diamond92.40
SWE-bench Verified80.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源