DataLearner logoDataLearnerAI
AI Tech Blogs
Leaderboards
Benchmarks
Models
Resources
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款

LLM Benchmark Performance Comparison

Quickly view LLM performance across benchmarks like MMLU Pro, HLE, SWE-Bench, and more. Compare models across general knowledge, coding, and reasoning capabilities. Customize your comparison by selecting specific models and benchmarks.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

Filters

Active
All3B and below7B13B34B65B100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI

LLM Performance Results

Data source: DataLearnerAI

SWE-bench Verified
RankModelMMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBenchParams (B)License
1M2.188.0081.0074.000.000.000.002300BFree commercial
2Claude Sonnet 4.588.0083.4082.000.000.0071.00—不开源
3GPT-4.586.1071.4038.0090.7036.7046.40—不开源
4DeepSeek-V3.185.0080.1066.000.0093.1074.806710BFree commercial
5DeepSeek-V3.1 Terminus85.0080.7068.400.000.0080.006710BFree commercial
6GLM-4.784.3085.7073.800.000.0084.903580BFree commercial
7Qwen3 Max (Preview)84.0076.0069.600.000.0057.50—不开源
8Qwen3-235B-A22B-250783.0077.500.000.000.0051.802350BFree commercial
9GLM-4.683.0082.9068.000.000.0084.503550BFree commercial
10Pangu Pro MoE82.6073.700.0096.8079.2059.60719BFree commercial
11MiniMax M282.0078.0069.400.000.0083.002300BFree commercial
12DeepSeek-V3-032481.2068.4038.8094.0059.4049.206710BFree commercial
13Kimi K281.1075.1051.8097.4069.6053.7010000BFree commercial
14GPT-4.180.5066.3054.6092.8048.1040.50—不开源
15GPT-4o(2025-03-27)79.8066.900.000.000.0035.80—不开源
16Gemini 2.0 Pro Experimental79.1064.700.000.0036.000.00—不开源
17Pangu Embedded79.000.000.0092.4081.9067.1070BFree commercial
18ERNIE-4.5-300B-A47B78.400.000.0096.4054.8038.803000BFree commercial
19Qwen3-30B-A3B-250778.4070.4022.000.000.0043.20305BFree commercial
20Claude 3.5 Sonnet New78.0065.0049.0078.0016.0038.70—不开源
21GPT-4o(2024-11-20)77.900.000.000.000.000.00—不开源
22Qwen2.5-Max76.100.000.000.000.000.00—不开源
23DeepSeek-V375.9059.100.0087.8039.0034.606810BFree commercial
24Grok 275.5056.000.000.000.000.002690BFree commercial
25GLM-4-9B-Chat72.400.000.000.0076.4051.8090BFree commercial
26Gemini 2.0 Flash-Lite71.6051.500.000.000.0028.90—不开源
27Mistral-Small-3.269.0646.130.000.000.000.00240BFree commercial
28Llama3.3-70B-Instruct68.9050.500.000.000.0033.30700BFree commercial
29Gemma 3 - 27B (IT)67.5042.400.000.0025.3029.70270BFree commercial
30Qwen3-Next66.050.000.000.000.0056.60800BFree commercial
31Mixtral-8x22B-Instruct-v0.156.330.000.000.000.000.001410BFree commercial
32Llama3-70B-Instruct56.200.000.000.000.000.00700BFree commercial
33Phi-4-mini-instruct (3.8B)52.8036.000.0071.8010.000.0038BFree commercial
34Llama3-70B52.780.000.000.000.000.00700BFree commercial
35Grok-1.551.0035.900.000.000.000.00—不开源
36Llama3.1-8B-Instruct44.0026.300.000.000.000.0080BFree commercial
37Moonlight-16B-A3B-Instruct42.400.000.000.000.000.00160BFree commercial
38Mistral-7B-Instruct-v0.330.9024.700.000.000.000.0070BFree commercial
39Gemini 2.5 Deep Think0.000.000.000.000.0087.60—不开源
40Gemini 2.5 Flash-Preview-09-20250.000.0054.000.000.000.00—不开源
41Kimi K2 09050.000.0069.200.000.000.0010000BFree commercial
42Step 3.5 Flash0.000.0074.400.000.0086.401960BFree commercial
43GPT-4.1 nano0.0050.300.000.0029.400.00—不开源
44Hunyuan-7B0.0060.100.0093.7081.1057.0070BFree commercial
45Qwen3-4B-25070.0062.000.000.000.0035.1040BFree commercial
46GPT-4.1 mini0.0065.0023.600.0049.600.00—不开源
47Qwen3-4B-Thinking-25070.0065.800.000.000.0055.2040BFree commercial
48Claude Sonnet 3.70.0068.0070.3082.2023.300.00—不开源
49Grok 30.0080.400.000.0084.2070.60—不开源
50Grok 4 Fast0.0085.700.000.000.0080.00—不开源
51Grok 4 Heavy0.0088.9073.500.000.000.00—不开源
52Gemini 3.0 Flash0.0090.4068.700.000.000.00—不开源
53GPT-5.20.0092.4080.000.000.000.00—不开源
1
M2.1
2300B
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified74.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
2
Claude Sonnet 4.5
MMLU Pro88.00
GPQA Diamond83.40
SWE-bench Verified82.00
MATH-5000.00
AIME 20240.00
LiveCodeBench71.00
不开源
3
GPT-4.5
MMLU Pro86.10
GPQA Diamond71.40
SWE-bench Verified38.00
MATH-50090.70
AIME 202436.70
LiveCodeBench46.40
不开源
4
DeepSeek-V3.1
6710B
MMLU Pro85.00
GPQA Diamond80.10
SWE-bench Verified66.00
MATH-5000.00
AIME 202493.10
LiveCodeBench74.80
Free commercial
5
DeepSeek-V3.1 Terminus
6710B
MMLU Pro85.00
GPQA Diamond80.70
SWE-bench Verified68.40
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
Free commercial
6
GLM-4.7
3580B
MMLU Pro84.30
GPQA Diamond85.70
SWE-bench Verified73.80
MATH-5000.00
AIME 20240.00
LiveCodeBench84.90
Free commercial
7
Qwen3 Max (Preview)
MMLU Pro84.00
GPQA Diamond76.00
SWE-bench Verified69.60
MATH-5000.00
AIME 20240.00
LiveCodeBench57.50
不开源
8
Qwen3-235B-A22B-2507
2350B
MMLU Pro83.00
GPQA Diamond77.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.80
Free commercial
9
GLM-4.6
3550B
MMLU Pro83.00
GPQA Diamond82.90
SWE-bench Verified68.00
MATH-5000.00
AIME 20240.00
LiveCodeBench84.50
Free commercial
10
Pangu Pro MoE
719B
MMLU Pro82.60
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50096.80
AIME 202479.20
LiveCodeBench59.60
Free commercial
11
MiniMax M2
2300B
MMLU Pro82.00
GPQA Diamond78.00
SWE-bench Verified69.40
MATH-5000.00
AIME 20240.00
LiveCodeBench83.00
Free commercial
12
DeepSeek-V3-0324
6710B
MMLU Pro81.20
GPQA Diamond68.40
SWE-bench Verified38.80
MATH-50094.00
AIME 202459.40
LiveCodeBench49.20
Free commercial
13
Kimi K2
10000B
MMLU Pro81.10
GPQA Diamond75.10
SWE-bench Verified51.80
MATH-50097.40
AIME 202469.60
LiveCodeBench53.70
Free commercial
14
GPT-4.1
MMLU Pro80.50
GPQA Diamond66.30
SWE-bench Verified54.60
MATH-50092.80
AIME 202448.10
LiveCodeBench40.50
不开源
15
GPT-4o(2025-03-27)
MMLU Pro79.80
GPQA Diamond66.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.80
不开源
16
Gemini 2.0 Pro Experimental
MMLU Pro79.10
GPQA Diamond64.70
SWE-bench Verified0.00
MATH-5000.00
AIME 202436.00
LiveCodeBench0.00
不开源
17
Pangu Embedded
70B
MMLU Pro79.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50092.40
AIME 202481.90
LiveCodeBench67.10
Free commercial
18
ERNIE-4.5-300B-A47B
3000B
MMLU Pro78.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.40
AIME 202454.80
LiveCodeBench38.80
Free commercial
19
Qwen3-30B-A3B-2507
305B
MMLU Pro78.40
GPQA Diamond70.40
SWE-bench Verified22.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.20
Free commercial
20
Claude 3.5 Sonnet New
MMLU Pro78.00
GPQA Diamond65.00
SWE-bench Verified49.00
MATH-50078.00
AIME 202416.00
LiveCodeBench38.70
不开源
21
GPT-4o(2024-11-20)
MMLU Pro77.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
22
Qwen2.5-Max
MMLU Pro76.10
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
23
DeepSeek-V3
6810B
MMLU Pro75.90
GPQA Diamond59.10
SWE-bench Verified0.00
MATH-50087.80
AIME 202439.00
LiveCodeBench34.60
Free commercial
24
Grok 2
2690B
MMLU Pro75.50
GPQA Diamond56.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
25
GLM-4-9B-Chat
90B
MMLU Pro72.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202476.40
LiveCodeBench51.80
Free commercial
26
Gemini 2.0 Flash-Lite
MMLU Pro71.60
GPQA Diamond51.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench28.90
不开源
27
Mistral-Small-3.2
240B
MMLU Pro69.06
GPQA Diamond46.13
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
28
Llama3.3-70B-Instruct
700B
MMLU Pro68.90
GPQA Diamond50.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
Free commercial
29
Gemma 3 - 27B (IT)
270B
MMLU Pro67.50
GPQA Diamond42.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202425.30
LiveCodeBench29.70
Free commercial
30
Qwen3-Next
800B
MMLU Pro66.05
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench56.60
Free commercial
31
Mixtral-8x22B-Instruct-v0.1
1410B
MMLU Pro56.33
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
32
Llama3-70B-Instruct
700B
MMLU Pro56.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
33
Phi-4-mini-instruct (3.8B)
38B
MMLU Pro52.80
GPQA Diamond36.00
SWE-bench Verified0.00
MATH-50071.80
AIME 202410.00
LiveCodeBench0.00
Free commercial
34
Llama3-70B
700B
MMLU Pro52.78
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
35
Grok-1.5
MMLU Pro51.00
GPQA Diamond35.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
36
Llama3.1-8B-Instruct
80B
MMLU Pro44.00
GPQA Diamond26.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
37
Moonlight-16B-A3B-Instruct
160B
MMLU Pro42.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
38
Mistral-7B-Instruct-v0.3
70B
MMLU Pro30.90
GPQA Diamond24.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
39
Gemini 2.5 Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench87.60
不开源
40
Gemini 2.5 Flash-Preview-09-2025
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified54.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
41
Kimi K2 0905
10000B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
42
Step 3.5 Flash
1960B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified74.40
MATH-5000.00
AIME 20240.00
LiveCodeBench86.40
Free commercial
43
GPT-4.1 nano
MMLU Pro0.00
GPQA Diamond50.30
SWE-bench Verified0.00
MATH-5000.00
AIME 202429.40
LiveCodeBench0.00
不开源
44
Hunyuan-7B
70B
MMLU Pro0.00
GPQA Diamond60.10
SWE-bench Verified0.00
MATH-50093.70
AIME 202481.10
LiveCodeBench57.00
Free commercial
45
Qwen3-4B-2507
40B
MMLU Pro0.00
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.10
Free commercial
46
GPT-4.1 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified23.60
MATH-5000.00
AIME 202449.60
LiveCodeBench0.00
不开源
47
Qwen3-4B-Thinking-2507
40B
MMLU Pro0.00
GPQA Diamond65.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.20
Free commercial
48
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond68.00
SWE-bench Verified70.30
MATH-50082.20
AIME 202423.30
LiveCodeBench0.00
不开源
49
Grok 3
MMLU Pro0.00
GPQA Diamond80.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202484.20
LiveCodeBench70.60
不开源
50
Grok 4 Fast
MMLU Pro0.00
GPQA Diamond85.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
不开源
51
Grok 4 Heavy
MMLU Pro0.00
GPQA Diamond88.90
SWE-bench Verified73.50
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
52
Gemini 3.0 Flash
MMLU Pro0.00
GPQA Diamond90.40
SWE-bench Verified68.70
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
53
GPT-5.2
MMLU Pro0.00
GPQA Diamond92.40
SWE-bench Verified80.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源