DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tools

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款

LLM Benchmark Performance Comparison

Compare model performance across MMLU Pro, HLE, SWE-Bench and more. Select benchmarks to view rankings.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

More benchmark coverage

Browse the benchmark catalog by category and language

More Benchmarks

Filters

Active
All

LLM Performance Results

Data source: DataLearnerAI
RankModelMMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBenchParams (B)License
1OpenAI o191.0477.3048.9096.4079.2071.00—不开源
2Claude Opus 4.590.0087.0080.90
3B and below
7B
13B
34B
65B
100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
0.00
0.00
87.00
—
不开源
3Claude Opus 4.188.0081.0079.400.000.0065.00—不开源
4Hunyuan-T187.2069.300.0096.2078.2064.90—不开源
5Grok 487.0087.0058.600.000.0082.00—不开源
6Qwen3.5-27B86.1085.5072.400.000.000.00270BFree commercial
7Gemini 2.5-Pro86.0086.4067.2098.8092.0077.10—不开源
8Qwen3-Max-Thinking85.7087.4075.300.000.0085.9010000B不开源
9OpenAI o385.6083.3069.1098.1091.6075.80—不开源
10Claude Opus 485.0079.6072.5098.2076.0056.60—不开源
11DeepSeek-R1-052885.0081.0057.6098.0091.4073.306710BFree commercial
12DeepSeek V3.2-Exp85.0079.9067.800.000.0074.106710BFree commercial
13Grok 4.1 Fast85.0085.000.000.000.0082.00—不开源
14GLM-4.584.6079.1064.2098.2091.0072.903550BFree commercial
15Kimi K2 Thinking84.6084.5071.300.000.0083.1010400BFree commercial
16Qwen3-235B-A22B-Thinking-250784.4081.100.000.000.0074.102350BFree commercial
17Qwen3-235B-A22B-Thinking84.4081.100.000.000.0074.10305BFree commercial
18DeepSeek-R184.0071.5049.2097.3079.8065.906710BFree commercial
19Claude Sonnet 484.0083.8080.200.0043.4066.00—不开源
20GLM-4.5-Air81.4075.0057.6098.1089.4070.701060BFree commercial
21MiniMax-M1-80k81.1070.0056.0096.8086.0065.004560BFree commercial
22OpenAI o4 - mini80.6081.4068.100.0098.700.00—不开源
23MiniMax-M1-40k80.6069.2055.6096.0083.3062.304560BFree commercial
24OpenAI o1-mini80.3060.000.0090.0063.6052.00—不开源
25Hunyuan-TurboS79.0057.500.000.000.0032.00—不开源
26GPT OSS 120B79.0080.1060.100.0096.600.00117BFree commercial
27QwQ-32B76.0058.000.0091.0079.500.00325BFree commercial
28GPT OSS 20B74.0071.5034.000.0096.000.00210BFree commercial
29Qwen3-235B-A22B72.9071.1034.4098.0085.7070.702350BFree commercial
30Qwen3-8B72.5062.000.0097.4079.4061.8080BFree commercial
31QwQ-32B-Preview70.970.000.0090.6050.000.00320BFree commercial
32Qwen3-30B-A3B69.1054.800.000.000.0029.00305BFree commercial
33Magistral-Medium-25060.0070.830.000.0073.5959.36—不开源
34OpenAI o3-mini0.0070.6040.8095.8060.000.00—不开源
35Qwen3-32B0.0068.400.0097.2081.4065.70320BFree commercial
36Magistral-Small-25060.0068.180.000.0070.6855.84240BFree commercial
37Gemini 2.5 Flash-Lite0.0066.7027.600.000.0034.30—不开源
38DeepSeek-R1-Distill-Llama-70B0.0065.200.0094.500.000.00700BFree commercial
39DeepSeek-R1-Distill-Qwen-7B0.0049.500.0091.4053.300.0070BFree commercial
40Phi-4-instruct (reasoning-trained)0.0049.000.0090.4050.000.0038B不开源
41Grok 3.50.000.000.000.000.000.00—不开源
42Kimi k1.5 (Long-CoT)0.000.000.0096.200.000.00—不开源
43Kimi k1.5 (Short-CoT)0.000.000.0094.600.000.00—不开源
44Gemini 2.5 Pro Deep Think0.000.000.000.000.0080.40—不开源
45Kimi-k1.6-IOI-high0.000.000.000.000.0073.80—不开源
46OpenAI o3-mini (medium)0.000.000.000.000.0067.40—不开源
47Kimi-k1.6-IOI0.000.000.000.000.0065.90—不开源
48QwQ-Max-Preview0.000.000.000.000.0065.60—Free commercial
49GLM-4.7-Flash0.0075.2059.200.000.000.00310BFree commercial
50OpenAI o3-mini (high)0.0079.7049.3097.9087.0069.50—不开源
51DeepSeek V3.20.0082.4073.100.000.0083.306710BFree commercial
52Gemini 2.5 Flash0.0082.8050.000.0088.0055.40—不开源
53Gemini-2.5-Pro-Preview-05-060.0083.0063.2098.8092.0077.10—不开源
54o3-pro0.0084.0075.000.0093.000.00—不开源
55Gemini 2.5 Pro Experimental 03-250.0084.0063.800.0092.0070.40—不开源
56Grok-3 mini - Reasoning0.0084.000.000.0096.000.00—不开源
57Grok-3 - Reasoning Beta0.0084.600.000.0093.3079.40—不开源
58Claude Sonnet 3.7-64K Extended Thinking0.0084.800.0096.2080.000.00—不开源
59MiniMax M2.50.0085.2080.200.000.000.002290BFree commercial
60GPT-5.4 mini0.0088.000.000.000.000.00—不开源
61GPT-5.10.0088.1076.300.000.000.00—不开源
62GPT-5-Pro0.0089.400.000.000.000.00—不开源
63Claude Opus 4.60.0091.3180.8497.600.0076.00—不开源
64GPT-5.2 Pro0.0093.200.000.000.000.00—不开源
1
OpenAI o1
MMLU Pro91.04
GPQA Diamond77.30
SWE-bench Verified48.90
MATH-50096.40
AIME 202479.20
LiveCodeBench71.00
不开源
2
Claude Opus 4.5
MMLU Pro90.00
GPQA Diamond87.00
SWE-bench Verified80.90
MATH-5000.00
AIME 20240.00
LiveCodeBench87.00
不开源
3
Claude Opus 4.1
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified79.40
MATH-5000.00
AIME 20240.00
LiveCodeBench65.00
不开源
4
Hunyuan-T1
MMLU Pro87.20
GPQA Diamond69.30
SWE-bench Verified0.00
MATH-50096.20
AIME 202478.20
LiveCodeBench64.90
不开源
5
Grok 4
MMLU Pro87.00
GPQA Diamond87.00
SWE-bench Verified58.60
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
不开源
6
Qwen3.5-27B
270B
MMLU Pro86.10
GPQA Diamond85.50
SWE-bench Verified72.40
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
7
Gemini 2.5-Pro
MMLU Pro86.00
GPQA Diamond86.40
SWE-bench Verified67.20
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
8
Qwen3-Max-Thinking
10000B
MMLU Pro85.70
GPQA Diamond87.40
SWE-bench Verified75.30
MATH-5000.00
AIME 20240.00
LiveCodeBench85.90
不开源
9
OpenAI o3
MMLU Pro85.60
GPQA Diamond83.30
SWE-bench Verified69.10
MATH-50098.10
AIME 202491.60
LiveCodeBench75.80
不开源
10
Claude Opus 4
MMLU Pro85.00
GPQA Diamond79.60
SWE-bench Verified72.50
MATH-50098.20
AIME 202476.00
LiveCodeBench56.60
不开源
11
DeepSeek-R1-0528
6710B
MMLU Pro85.00
GPQA Diamond81.00
SWE-bench Verified57.60
MATH-50098.00
AIME 202491.40
LiveCodeBench73.30
Free commercial
12
DeepSeek V3.2-Exp
6710B
MMLU Pro85.00
GPQA Diamond79.90
SWE-bench Verified67.80
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
13
Grok 4.1 Fast
MMLU Pro85.00
GPQA Diamond85.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
不开源
14
GLM-4.5
3550B
MMLU Pro84.60
GPQA Diamond79.10
SWE-bench Verified64.20
MATH-50098.20
AIME 202491.00
LiveCodeBench72.90
Free commercial
15
Kimi K2 Thinking
10400B
MMLU Pro84.60
GPQA Diamond84.50
SWE-bench Verified71.30
MATH-5000.00
AIME 20240.00
LiveCodeBench83.10
Free commercial
16
Qwen3-235B-A22B-Thinking-2507
2350B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
17
Qwen3-235B-A22B-Thinking
305B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
18
DeepSeek-R1
6710B
MMLU Pro84.00
GPQA Diamond71.50
SWE-bench Verified49.20
MATH-50097.30
AIME 202479.80
LiveCodeBench65.90
Free commercial
19
Claude Sonnet 4
MMLU Pro84.00
GPQA Diamond83.80
SWE-bench Verified80.20
MATH-5000.00
AIME 202443.40
LiveCodeBench66.00
不开源
20
GLM-4.5-Air
1060B
MMLU Pro81.40
GPQA Diamond75.00
SWE-bench Verified57.60
MATH-50098.10
AIME 202489.40
LiveCodeBench70.70
Free commercial
21
MiniMax-M1-80k
4560B
MMLU Pro81.10
GPQA Diamond70.00
SWE-bench Verified56.00
MATH-50096.80
AIME 202486.00
LiveCodeBench65.00
Free commercial
22
OpenAI o4 - mini
MMLU Pro80.60
GPQA Diamond81.40
SWE-bench Verified68.10
MATH-5000.00
AIME 202498.70
LiveCodeBench0.00
不开源
23
MiniMax-M1-40k
4560B
MMLU Pro80.60
GPQA Diamond69.20
SWE-bench Verified55.60
MATH-50096.00
AIME 202483.30
LiveCodeBench62.30
Free commercial
24
OpenAI o1-mini
MMLU Pro80.30
GPQA Diamond60.00
SWE-bench Verified0.00
MATH-50090.00
AIME 202463.60
LiveCodeBench52.00
不开源
25
Hunyuan-TurboS
MMLU Pro79.00
GPQA Diamond57.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.00
不开源
26
GPT OSS 120B
117B
MMLU Pro79.00
GPQA Diamond80.10
SWE-bench Verified60.10
MATH-5000.00
AIME 202496.60
LiveCodeBench0.00
Free commercial
27
QwQ-32B
325B
MMLU Pro76.00
GPQA Diamond58.00
SWE-bench Verified0.00
MATH-50091.00
AIME 202479.50
LiveCodeBench0.00
Free commercial
28
GPT OSS 20B
210B
MMLU Pro74.00
GPQA Diamond71.50
SWE-bench Verified34.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
Free commercial
29
Qwen3-235B-A22B
2350B
MMLU Pro72.90
GPQA Diamond71.10
SWE-bench Verified34.40
MATH-50098.00
AIME 202485.70
LiveCodeBench70.70
Free commercial
30
Qwen3-8B
80B
MMLU Pro72.50
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-50097.40
AIME 202479.40
LiveCodeBench61.80
Free commercial
31
QwQ-32B-Preview
320B
MMLU Pro70.97
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50090.60
AIME 202450.00
LiveCodeBench0.00
Free commercial
32
Qwen3-30B-A3B
305B
MMLU Pro69.10
GPQA Diamond54.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench29.00
Free commercial
33
Magistral-Medium-2506
MMLU Pro0.00
GPQA Diamond70.83
SWE-bench Verified0.00
MATH-5000.00
AIME 202473.59
LiveCodeBench59.36
不开源
34
OpenAI o3-mini
MMLU Pro0.00
GPQA Diamond70.60
SWE-bench Verified40.80
MATH-50095.80
AIME 202460.00
LiveCodeBench0.00
不开源
35
Qwen3-32B
320B
MMLU Pro0.00
GPQA Diamond68.40
SWE-bench Verified0.00
MATH-50097.20
AIME 202481.40
LiveCodeBench65.70
Free commercial
36
Magistral-Small-2506
240B
MMLU Pro0.00
GPQA Diamond68.18
SWE-bench Verified0.00
MATH-5000.00
AIME 202470.68
LiveCodeBench55.84
Free commercial
37
Gemini 2.5 Flash-Lite
MMLU Pro0.00
GPQA Diamond66.70
SWE-bench Verified27.60
MATH-5000.00
AIME 20240.00
LiveCodeBench34.30
不开源
38
DeepSeek-R1-Distill-Llama-70B
700B
MMLU Pro0.00
GPQA Diamond65.20
SWE-bench Verified0.00
MATH-50094.50
AIME 20240.00
LiveCodeBench0.00
Free commercial
39
DeepSeek-R1-Distill-Qwen-7B
70B
MMLU Pro0.00
GPQA Diamond49.50
SWE-bench Verified0.00
MATH-50091.40
AIME 202453.30
LiveCodeBench0.00
Free commercial
40
Phi-4-instruct (reasoning-trained)
38B
MMLU Pro0.00
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-50090.40
AIME 202450.00
LiveCodeBench0.00
不开源
41
Grok 3.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
42
Kimi k1.5 (Long-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.20
AIME 20240.00
LiveCodeBench0.00
不开源
43
Kimi k1.5 (Short-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50094.60
AIME 20240.00
LiveCodeBench0.00
不开源
44
Gemini 2.5 Pro Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.40
不开源
45
Kimi-k1.6-IOI-high
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench73.80
不开源
46
OpenAI o3-mini (medium)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.40
不开源
47
Kimi-k1.6-IOI
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.90
不开源
48
QwQ-Max-Preview
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.60
Free commercial
49
GLM-4.7-Flash
310B
MMLU Pro0.00
GPQA Diamond75.20
SWE-bench Verified59.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
50
OpenAI o3-mini (high)
MMLU Pro0.00
GPQA Diamond79.70
SWE-bench Verified49.30
MATH-50097.90
AIME 202487.00
LiveCodeBench69.50
不开源
51
DeepSeek V3.2
6710B
MMLU Pro0.00
GPQA Diamond82.40
SWE-bench Verified73.10
MATH-5000.00
AIME 20240.00
LiveCodeBench83.30
Free commercial
52
Gemini 2.5 Flash
MMLU Pro0.00
GPQA Diamond82.80
SWE-bench Verified50.00
MATH-5000.00
AIME 202488.00
LiveCodeBench55.40
不开源
53
Gemini-2.5-Pro-Preview-05-06
MMLU Pro0.00
GPQA Diamond83.00
SWE-bench Verified63.20
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
54
o3-pro
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified75.00
MATH-5000.00
AIME 202493.00
LiveCodeBench0.00
不开源
55
Gemini 2.5 Pro Experimental 03-25
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified63.80
MATH-5000.00
AIME 202492.00
LiveCodeBench70.40
不开源
56
Grok-3 mini - Reasoning
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
不开源
57
Grok-3 - Reasoning Beta
MMLU Pro0.00
GPQA Diamond84.60
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.30
LiveCodeBench79.40
不开源
58
Claude Sonnet 3.7-64K Extended Thinking
MMLU Pro0.00
GPQA Diamond84.80
SWE-bench Verified0.00
MATH-50096.20
AIME 202480.00
LiveCodeBench0.00
不开源
59
MiniMax M2.5
2290B
MMLU Pro0.00
GPQA Diamond85.20
SWE-bench Verified80.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
60
GPT-5.4 mini
MMLU Pro0.00
GPQA Diamond88.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
61
GPT-5.1
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified76.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
62
GPT-5-Pro
MMLU Pro0.00
GPQA Diamond89.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
63
Claude Opus 4.6
MMLU Pro0.00
GPQA Diamond91.31
SWE-bench Verified80.84
MATH-50097.60
AIME 20240.00
LiveCodeBench76.00
不开源
64
GPT-5.2 Pro
MMLU Pro0.00
GPQA Diamond93.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源