DataLearner logoDataLearnerAI
AI Tech Blogs
Leaderboards
Benchmarks
Models
Resources
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款

LLM Benchmark Performance Comparison

Quickly view LLM performance across benchmarks like MMLU Pro, HLE, SWE-Bench, and more. Compare models across general knowledge, coding, and reasoning capabilities. Customize your comparison by selecting specific models and benchmarks.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

Filters

Active
All3B and below7B13B34B65B100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI

LLM Performance Results

Data source: DataLearnerAI

MATH-500
RankModelMMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBenchParams (B)License
1OpenAI o191.0477.3048.9096.4079.2071.00—不开源
2M2.188.0081.0074.000.000.000.002300BFree commercial
3GPT-4.586.1071.4038.0090.7036.7046.40—不开源
4Qwen3-Max-Thinking85.7087.4075.300.000.0085.9010000B不开源
5DeepSeek-R1-052885.0081.0057.6098.0091.4073.306710BFree commercial
6DeepSeek-V3.185.0080.1066.000.0093.1074.806710BFree commercial
7DeepSeek-V3.1 Terminus85.0080.7068.400.000.0080.006710BFree commercial
8DeepSeek V3.2-Exp85.0079.9067.800.000.0074.106710BFree commercial
9Claude Opus 485.0079.6072.5098.2076.0056.60—不开源
10GLM-4.584.6079.1064.2098.2091.0072.903550BFree commercial
11Kimi K2 Thinking84.6084.5071.300.000.0083.1010400BFree commercial
12Qwen3-235B-A22B-Thinking-250784.4081.100.000.000.0074.102350BFree commercial
13GLM-4.784.3085.7073.800.000.0084.903580BFree commercial
14DeepSeek-R184.0071.5049.2097.3079.8065.906710BFree commercial
15Intern-S183.5077.300.000.000.000.002410BFree commercial
16Qwen3-235B-A22B-250783.0077.500.000.000.0051.802350BFree commercial
17GLM-4.683.0082.9068.000.000.0084.503550BFree commercial
18Llama 4 Behemoth Instruct82.2073.700.0095.000.0049.4020000BFree commercial
19MiniMax M282.0078.0069.400.000.0083.002300BFree commercial
20GLM-4.5-Air81.4075.0057.6098.1089.4070.701060BFree commercial
21DeepSeek-V3-032481.2068.4038.8094.0059.4049.206710BFree commercial
22Kimi K281.1075.1051.8097.4069.6053.7010000BFree commercial
23MiniMax-M1-80k81.1070.0056.0096.8086.0065.004560BFree commercial
24OpenAI o4 - mini80.6081.4068.100.0098.700.00—不开源
25MiniMax-M1-40k80.6069.2055.6096.0083.3062.304560BFree commercial
26Llama 4 Maverick Instruct80.5069.800.000.000.0043.404000BFree commercial
27GPT-4.180.5066.3054.6092.8048.1040.50—不开源
28OpenAI o1-mini80.3060.000.0090.0063.6052.00—不开源
29Gemini 2.0 Pro Experimental79.1064.700.000.0036.000.00—不开源
30Hunyuan-TurboS79.0057.500.000.000.0032.00—不开源
31Kimi K2.578.5087.6076.800.000.0085.0010000BFree commercial
32ERNIE-4.5-300B-A47B78.400.000.0096.4054.8038.803000BFree commercial
33GPT-4o(2024-11-20)77.900.000.000.000.000.00—不开源
34Claude 3.5 Sonnet77.6459.400.000.000.000.00—不开源
35Gemini 2.0 Flash Experimental76.2465.2021.400.000.0029.10—不开源
36Qwen2.5-Max76.100.000.000.000.000.00—不开源
37DeepSeek-V375.9059.100.0087.8039.0034.606810BFree commercial
38Grok 275.5056.000.000.000.000.002690BFree commercial
39Llama 4 Scout Instruct74.3057.200.000.000.0032.801090BFree commercial
40Llama3.1-405B Instruct73.4049.000.000.000.0030.204050BFree commercial
41Qwen3-235B-A22B72.9071.1034.4098.0085.7070.702350BFree commercial
42Gemini 2.0 Flash-Lite71.6051.500.000.000.0028.90—不开源
43Llama 4 Maverick62.900.000.000.000.000.004000BFree commercial
44Llama3.1-405B61.600.000.000.000.000.004050BFree commercial
45Llama 4 Scout58.200.000.000.000.000.001090BFree commercial
46Mixtral-8x22B-Instruct-v0.156.330.000.000.000.000.001410BFree commercial
47Grok-1.551.0035.900.000.000.000.00—不开源
48Grok 3 mini0.0065.000.000.0040.000.00—不开源
49Codestral 25.010.000.000.000.000.0037.90—不开源
50GPT-4.1 mini0.0065.0023.600.0049.600.00—不开源
51GPT-4.1 nano0.0050.300.000.0029.400.00—不开源
52Grok 3.50.000.000.000.000.000.00—不开源
53Step 3.5 Flash0.000.0074.400.000.0086.401960BFree commercial
54Kimi K2 09050.000.0069.200.000.000.0010000BFree commercial
55Qwen3-Coder-480B-A35B0.000.0067.000.000.000.004800BFree commercial
56Kimi k1.5 (Long-CoT)0.000.000.0096.200.000.00—不开源
57Kimi k1.5 (Short-CoT)0.000.000.0094.600.000.00—不开源
58Gemini 2.5 Pro Deep Think0.000.000.000.000.0080.40—不开源
59Kimi-k1.6-IOI-high0.000.000.000.000.0073.80—不开源
60OpenAI o3-mini (medium)0.000.000.000.000.0067.40—不开源
61Kimi-k1.6-IOI0.000.000.000.000.0065.90—不开源
62QwQ-Max-Preview0.000.000.000.000.0065.60—Free commercial
63Gemini 2.5 Flash-Lite0.0066.7027.600.000.0034.30—不开源
64Claude Sonnet 3.70.0068.0070.3082.2023.300.00—不开源
65Magistral-Medium-25060.0070.830.000.0073.5959.36—不开源
66Step30.0073.000.000.000.0067.103210BFree commercial
67ERNIE-4.5-VL-424B-A47B-Base0.0076.800.000.000.0038.804240BFree commercial
68OpenAI o3-mini (high)0.0079.7049.3097.9087.0069.50—不开源
69Grok 30.0080.400.000.0084.2070.60—不开源
70DeepSeek V3.20.0082.4073.100.000.0083.306710BFree commercial
71Gemini 2.5 Flash0.0082.8050.000.0088.0055.40—不开源
72Gemini-2.5-Pro-Preview-05-060.0083.0063.2098.8092.0077.10—不开源
73o3-pro0.0084.0075.000.0093.000.00—不开源
74Grok-3 mini - Reasoning0.0084.000.000.0096.000.00—不开源
75Grok-3 - Reasoning Beta0.0084.600.000.0093.3079.40—不开源
76Claude Sonnet 3.7-64K Extended Thinking0.0084.800.0096.2080.000.00—不开源
77Amazon Nova Pro0.000.000.000.000.000.00—不开源
1
OpenAI o1
MMLU Pro91.04
GPQA Diamond77.30
SWE-bench Verified48.90
MATH-50096.40
AIME 202479.20
LiveCodeBench71.00
不开源
2
M2.1
2300B
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified74.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
3
GPT-4.5
MMLU Pro86.10
GPQA Diamond71.40
SWE-bench Verified38.00
MATH-50090.70
AIME 202436.70
LiveCodeBench46.40
不开源
4
Qwen3-Max-Thinking
10000B
MMLU Pro85.70
GPQA Diamond87.40
SWE-bench Verified75.30
MATH-5000.00
AIME 20240.00
LiveCodeBench85.90
不开源
5
DeepSeek-R1-0528
6710B
MMLU Pro85.00
GPQA Diamond81.00
SWE-bench Verified57.60
MATH-50098.00
AIME 202491.40
LiveCodeBench73.30
Free commercial
6
DeepSeek-V3.1
6710B
MMLU Pro85.00
GPQA Diamond80.10
SWE-bench Verified66.00
MATH-5000.00
AIME 202493.10
LiveCodeBench74.80
Free commercial
7
DeepSeek-V3.1 Terminus
6710B
MMLU Pro85.00
GPQA Diamond80.70
SWE-bench Verified68.40
MATH-5000.00
AIME 20240.00
LiveCodeBench80.00
Free commercial
8
DeepSeek V3.2-Exp
6710B
MMLU Pro85.00
GPQA Diamond79.90
SWE-bench Verified67.80
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
9
Claude Opus 4
MMLU Pro85.00
GPQA Diamond79.60
SWE-bench Verified72.50
MATH-50098.20
AIME 202476.00
LiveCodeBench56.60
不开源
10
GLM-4.5
3550B
MMLU Pro84.60
GPQA Diamond79.10
SWE-bench Verified64.20
MATH-50098.20
AIME 202491.00
LiveCodeBench72.90
Free commercial
11
Kimi K2 Thinking
10400B
MMLU Pro84.60
GPQA Diamond84.50
SWE-bench Verified71.30
MATH-5000.00
AIME 20240.00
LiveCodeBench83.10
Free commercial
12
Qwen3-235B-A22B-Thinking-2507
2350B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
13
GLM-4.7
3580B
MMLU Pro84.30
GPQA Diamond85.70
SWE-bench Verified73.80
MATH-5000.00
AIME 20240.00
LiveCodeBench84.90
Free commercial
14
DeepSeek-R1
6710B
MMLU Pro84.00
GPQA Diamond71.50
SWE-bench Verified49.20
MATH-50097.30
AIME 202479.80
LiveCodeBench65.90
Free commercial
15
Intern-S1
2410B
MMLU Pro83.50
GPQA Diamond77.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
16
Qwen3-235B-A22B-2507
2350B
MMLU Pro83.00
GPQA Diamond77.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.80
Free commercial
17
GLM-4.6
3550B
MMLU Pro83.00
GPQA Diamond82.90
SWE-bench Verified68.00
MATH-5000.00
AIME 20240.00
LiveCodeBench84.50
Free commercial
18
Llama 4 Behemoth Instruct
20000B
MMLU Pro82.20
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50095.00
AIME 20240.00
LiveCodeBench49.40
Free commercial
19
MiniMax M2
2300B
MMLU Pro82.00
GPQA Diamond78.00
SWE-bench Verified69.40
MATH-5000.00
AIME 20240.00
LiveCodeBench83.00
Free commercial
20
GLM-4.5-Air
1060B
MMLU Pro81.40
GPQA Diamond75.00
SWE-bench Verified57.60
MATH-50098.10
AIME 202489.40
LiveCodeBench70.70
Free commercial
21
DeepSeek-V3-0324
6710B
MMLU Pro81.20
GPQA Diamond68.40
SWE-bench Verified38.80
MATH-50094.00
AIME 202459.40
LiveCodeBench49.20
Free commercial
22
Kimi K2
10000B
MMLU Pro81.10
GPQA Diamond75.10
SWE-bench Verified51.80
MATH-50097.40
AIME 202469.60
LiveCodeBench53.70
Free commercial
23
MiniMax-M1-80k
4560B
MMLU Pro81.10
GPQA Diamond70.00
SWE-bench Verified56.00
MATH-50096.80
AIME 202486.00
LiveCodeBench65.00
Free commercial
24
OpenAI o4 - mini
MMLU Pro80.60
GPQA Diamond81.40
SWE-bench Verified68.10
MATH-5000.00
AIME 202498.70
LiveCodeBench0.00
不开源
25
MiniMax-M1-40k
4560B
MMLU Pro80.60
GPQA Diamond69.20
SWE-bench Verified55.60
MATH-50096.00
AIME 202483.30
LiveCodeBench62.30
Free commercial
26
Llama 4 Maverick Instruct
4000B
MMLU Pro80.50
GPQA Diamond69.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.40
Free commercial
27
GPT-4.1
MMLU Pro80.50
GPQA Diamond66.30
SWE-bench Verified54.60
MATH-50092.80
AIME 202448.10
LiveCodeBench40.50
不开源
28
OpenAI o1-mini
MMLU Pro80.30
GPQA Diamond60.00
SWE-bench Verified0.00
MATH-50090.00
AIME 202463.60
LiveCodeBench52.00
不开源
29
Gemini 2.0 Pro Experimental
MMLU Pro79.10
GPQA Diamond64.70
SWE-bench Verified0.00
MATH-5000.00
AIME 202436.00
LiveCodeBench0.00
不开源
30
Hunyuan-TurboS
MMLU Pro79.00
GPQA Diamond57.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.00
不开源
31
Kimi K2.5
10000B
MMLU Pro78.50
GPQA Diamond87.60
SWE-bench Verified76.80
MATH-5000.00
AIME 20240.00
LiveCodeBench85.00
Free commercial
32
ERNIE-4.5-300B-A47B
3000B
MMLU Pro78.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.40
AIME 202454.80
LiveCodeBench38.80
Free commercial
33
GPT-4o(2024-11-20)
MMLU Pro77.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
34
Claude 3.5 Sonnet
MMLU Pro77.64
GPQA Diamond59.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
35
Gemini 2.0 Flash Experimental
MMLU Pro76.24
GPQA Diamond65.20
SWE-bench Verified21.40
MATH-5000.00
AIME 20240.00
LiveCodeBench29.10
不开源
36
Qwen2.5-Max
MMLU Pro76.10
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
37
DeepSeek-V3
6810B
MMLU Pro75.90
GPQA Diamond59.10
SWE-bench Verified0.00
MATH-50087.80
AIME 202439.00
LiveCodeBench34.60
Free commercial
38
Grok 2
2690B
MMLU Pro75.50
GPQA Diamond56.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
39
Llama 4 Scout Instruct
1090B
MMLU Pro74.30
GPQA Diamond57.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.80
Free commercial
40
Llama3.1-405B Instruct
4050B
MMLU Pro73.40
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench30.20
Free commercial
41
Qwen3-235B-A22B
2350B
MMLU Pro72.90
GPQA Diamond71.10
SWE-bench Verified34.40
MATH-50098.00
AIME 202485.70
LiveCodeBench70.70
Free commercial
42
Gemini 2.0 Flash-Lite
MMLU Pro71.60
GPQA Diamond51.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench28.90
不开源
43
Llama 4 Maverick
4000B
MMLU Pro62.90
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
44
Llama3.1-405B
4050B
MMLU Pro61.60
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
45
Llama 4 Scout
1090B
MMLU Pro58.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
46
Mixtral-8x22B-Instruct-v0.1
1410B
MMLU Pro56.33
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
47
Grok-1.5
MMLU Pro51.00
GPQA Diamond35.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
48
Grok 3 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202440.00
LiveCodeBench0.00
不开源
49
Codestral 25.01
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench37.90
不开源
50
GPT-4.1 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified23.60
MATH-5000.00
AIME 202449.60
LiveCodeBench0.00
不开源
51
GPT-4.1 nano
MMLU Pro0.00
GPQA Diamond50.30
SWE-bench Verified0.00
MATH-5000.00
AIME 202429.40
LiveCodeBench0.00
不开源
52
Grok 3.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
53
Step 3.5 Flash
1960B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified74.40
MATH-5000.00
AIME 20240.00
LiveCodeBench86.40
Free commercial
54
Kimi K2 0905
10000B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified69.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
55
Qwen3-Coder-480B-A35B
4800B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified67.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
56
Kimi k1.5 (Long-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.20
AIME 20240.00
LiveCodeBench0.00
不开源
57
Kimi k1.5 (Short-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50094.60
AIME 20240.00
LiveCodeBench0.00
不开源
58
Gemini 2.5 Pro Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.40
不开源
59
Kimi-k1.6-IOI-high
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench73.80
不开源
60
OpenAI o3-mini (medium)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.40
不开源
61
Kimi-k1.6-IOI
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.90
不开源
62
QwQ-Max-Preview
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.60
Free commercial
63
Gemini 2.5 Flash-Lite
MMLU Pro0.00
GPQA Diamond66.70
SWE-bench Verified27.60
MATH-5000.00
AIME 20240.00
LiveCodeBench34.30
不开源
64
Claude Sonnet 3.7
MMLU Pro0.00
GPQA Diamond68.00
SWE-bench Verified70.30
MATH-50082.20
AIME 202423.30
LiveCodeBench0.00
不开源
65
Magistral-Medium-2506
MMLU Pro0.00
GPQA Diamond70.83
SWE-bench Verified0.00
MATH-5000.00
AIME 202473.59
LiveCodeBench59.36
不开源
66
Step3
3210B
MMLU Pro0.00
GPQA Diamond73.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.10
Free commercial
67
ERNIE-4.5-VL-424B-A47B-Base
4240B
MMLU Pro0.00
GPQA Diamond76.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench38.80
Free commercial
68
OpenAI o3-mini (high)
MMLU Pro0.00
GPQA Diamond79.70
SWE-bench Verified49.30
MATH-50097.90
AIME 202487.00
LiveCodeBench69.50
不开源
69
Grok 3
MMLU Pro0.00
GPQA Diamond80.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202484.20
LiveCodeBench70.60
不开源
70
DeepSeek V3.2
6710B
MMLU Pro0.00
GPQA Diamond82.40
SWE-bench Verified73.10
MATH-5000.00
AIME 20240.00
LiveCodeBench83.30
Free commercial
71
Gemini 2.5 Flash
MMLU Pro0.00
GPQA Diamond82.80
SWE-bench Verified50.00
MATH-5000.00
AIME 202488.00
LiveCodeBench55.40
不开源
72
Gemini-2.5-Pro-Preview-05-06
MMLU Pro0.00
GPQA Diamond83.00
SWE-bench Verified63.20
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
73
o3-pro
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified75.00
MATH-5000.00
AIME 202493.00
LiveCodeBench0.00
不开源
74
Grok-3 mini - Reasoning
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
不开源
75
Grok-3 - Reasoning Beta
MMLU Pro0.00
GPQA Diamond84.60
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.30
LiveCodeBench79.40
不开源
76
Claude Sonnet 3.7-64K Extended Thinking
MMLU Pro0.00
GPQA Diamond84.80
SWE-bench Verified0.00
MATH-50096.20
AIME 202480.00
LiveCodeBench0.00
不开源
77
Amazon Nova Pro
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源