DataLearner logoDataLearnerAI
AI Tech Blogs
Leaderboards
Benchmarks
Models
Resources
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款

LLM Benchmark Performance Comparison

Quickly view LLM performance across benchmarks like MMLU Pro, HLE, SWE-Bench, and more. Compare models across general knowledge, coding, and reasoning capabilities. Customize your comparison by selecting specific models and benchmarks.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

Filters

Active
All3B and below7B13B34B65B100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI

LLM Performance Results

Data source: DataLearnerAI

LiveCodeBench
RankModelMMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBenchParams (B)License
1OpenAI o191.0477.3048.9096.4079.2071.00—不开源
2Claude Opus 4.590.0087.0080.900.000.0087.00—不开源
3Claude Opus 4.188.0081.0079.400.000.0065.00—不开源
4Hunyuan-T187.2069.300.0096.2078.2064.90—不开源
5Grok 487.0087.0058.600.000.0082.00—不开源
6Gemini 2.5-Pro86.0086.4067.2098.8092.0077.10—不开源
7Qwen3-Max-Thinking85.7087.4075.300.000.0085.9010000B不开源
8OpenAI o385.6083.3069.1098.1091.6075.80—不开源
9Claude Opus 485.0079.6072.5098.2076.0056.60—不开源
10DeepSeek-R1-052885.0081.0057.6098.0091.4073.306710BFree commercial
11DeepSeek V3.2-Exp85.0079.9067.800.000.0074.106710BFree commercial
12Grok 4.1 Fast85.0085.000.000.000.0082.00—不开源
13GLM-4.584.6079.1064.2098.2091.0072.903550BFree commercial
14Kimi K2 Thinking84.6084.5071.300.000.0083.1010400BFree commercial
15Qwen3-235B-A22B-Thinking-250784.4081.100.000.000.0074.102350BFree commercial
16Qwen3-235B-A22B-Thinking84.4081.100.000.000.0074.10305BFree commercial
17DeepSeek-R184.0071.5049.2097.3079.8065.906710BFree commercial
18Claude Sonnet 484.0083.8080.200.0043.4066.00—不开源
19GLM-4.5-Air81.4075.0057.6098.1089.4070.701060BFree commercial
20MiniMax-M1-80k81.1070.0056.0096.8086.0065.004560BFree commercial
21OpenAI o4 - mini80.6081.4068.100.0098.700.00—不开源
22MiniMax-M1-40k80.6069.2055.6096.0083.3062.304560BFree commercial
23OpenAI o1-mini80.3060.000.0090.0063.6052.00—不开源
24Hunyuan-TurboS79.0057.500.000.000.0032.00—不开源
25GPT OSS 120B79.0080.1060.100.0096.600.00117BFree commercial
26QwQ-32B76.0058.000.0091.0079.500.00325BFree commercial
27GPT OSS 20B74.0071.5034.000.0096.000.00210BFree commercial
28Qwen3-235B-A22B72.9071.1034.4098.0085.7070.702350BFree commercial
29Qwen3-8B72.5062.000.0097.4079.4061.8080BFree commercial
30QwQ-32B-Preview70.970.000.0090.6050.000.00320BFree commercial
31Qwen3-30B-A3B69.1054.800.000.000.0029.00305BFree commercial
32OpenAI o3-mini0.0070.6040.8095.8060.000.00—不开源
33QwQ-Max-Preview0.000.000.000.000.0065.60—Free commercial
34Kimi-k1.6-IOI0.000.000.000.000.0065.90—不开源
35OpenAI o3-mini (medium)0.000.000.000.000.0067.40—不开源
36Kimi-k1.6-IOI-high0.000.000.000.000.0073.80—不开源
37Gemini 2.5 Pro Deep Think0.000.000.000.000.0080.40—不开源
38Kimi k1.5 (Short-CoT)0.000.000.0094.600.000.00—不开源
39Kimi k1.5 (Long-CoT)0.000.000.0096.200.000.00—不开源
40Grok 3.50.000.000.000.000.000.00—不开源
41Phi-4-instruct (reasoning-trained)0.0049.000.0090.4050.000.0038B不开源
42DeepSeek-R1-Distill-Qwen-7B0.0049.500.0091.4053.300.0070BFree commercial
43DeepSeek-R1-Distill-Llama-70B0.0065.200.0094.500.000.00700BFree commercial
44Gemini 2.5 Flash-Lite0.0066.7027.600.000.0034.30—不开源
45Magistral-Small-25060.0068.180.000.0070.6855.84240BFree commercial
46Qwen3-32B0.0068.400.0097.2081.4065.70320BFree commercial
47GPT-5.2 Pro0.0093.200.000.000.000.00—不开源
48Magistral-Medium-25060.0070.830.000.0073.5959.36—不开源
49GLM-4.7-Flash0.0075.2059.200.000.000.00310BFree commercial
50OpenAI o3-mini (high)0.0079.7049.3097.9087.0069.50—不开源
51DeepSeek V3.20.0082.4073.100.000.0083.306710BFree commercial
52Gemini 2.5 Flash0.0082.8050.000.0088.0055.40—不开源
53Gemini-2.5-Pro-Preview-05-060.0083.0063.2098.8092.0077.10—不开源
54o3-pro0.0084.0075.000.0093.000.00—不开源
55Gemini 2.5 Pro Experimental 03-250.0084.0063.800.0092.0070.40—不开源
56Grok-3 mini - Reasoning0.0084.000.000.0096.000.00—不开源
57Grok-3 - Reasoning Beta0.0084.600.000.0093.3079.40—不开源
58Claude Sonnet 3.7-64K Extended Thinking0.0084.800.0096.2080.000.00—不开源
59GPT-5.10.0088.1076.300.000.000.00—不开源
60GPT-5-Pro0.0089.400.000.000.000.00—不开源
1
OpenAI o1
MMLU Pro91.04
GPQA Diamond77.30
SWE-bench Verified48.90
MATH-50096.40
AIME 202479.20
LiveCodeBench71.00
不开源
2
Claude Opus 4.5
MMLU Pro90.00
GPQA Diamond87.00
SWE-bench Verified80.90
MATH-5000.00
AIME 20240.00
LiveCodeBench87.00
不开源
3
Claude Opus 4.1
MMLU Pro88.00
GPQA Diamond81.00
SWE-bench Verified79.40
MATH-5000.00
AIME 20240.00
LiveCodeBench65.00
不开源
4
Hunyuan-T1
MMLU Pro87.20
GPQA Diamond69.30
SWE-bench Verified0.00
MATH-50096.20
AIME 202478.20
LiveCodeBench64.90
不开源
5
Grok 4
MMLU Pro87.00
GPQA Diamond87.00
SWE-bench Verified58.60
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
不开源
6
Gemini 2.5-Pro
MMLU Pro86.00
GPQA Diamond86.40
SWE-bench Verified67.20
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
7
Qwen3-Max-Thinking
10000B
MMLU Pro85.70
GPQA Diamond87.40
SWE-bench Verified75.30
MATH-5000.00
AIME 20240.00
LiveCodeBench85.90
不开源
8
OpenAI o3
MMLU Pro85.60
GPQA Diamond83.30
SWE-bench Verified69.10
MATH-50098.10
AIME 202491.60
LiveCodeBench75.80
不开源
9
Claude Opus 4
MMLU Pro85.00
GPQA Diamond79.60
SWE-bench Verified72.50
MATH-50098.20
AIME 202476.00
LiveCodeBench56.60
不开源
10
DeepSeek-R1-0528
6710B
MMLU Pro85.00
GPQA Diamond81.00
SWE-bench Verified57.60
MATH-50098.00
AIME 202491.40
LiveCodeBench73.30
Free commercial
11
DeepSeek V3.2-Exp
6710B
MMLU Pro85.00
GPQA Diamond79.90
SWE-bench Verified67.80
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
12
Grok 4.1 Fast
MMLU Pro85.00
GPQA Diamond85.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench82.00
不开源
13
GLM-4.5
3550B
MMLU Pro84.60
GPQA Diamond79.10
SWE-bench Verified64.20
MATH-50098.20
AIME 202491.00
LiveCodeBench72.90
Free commercial
14
Kimi K2 Thinking
10400B
MMLU Pro84.60
GPQA Diamond84.50
SWE-bench Verified71.30
MATH-5000.00
AIME 20240.00
LiveCodeBench83.10
Free commercial
15
Qwen3-235B-A22B-Thinking-2507
2350B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
16
Qwen3-235B-A22B-Thinking
305B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
17
DeepSeek-R1
6710B
MMLU Pro84.00
GPQA Diamond71.50
SWE-bench Verified49.20
MATH-50097.30
AIME 202479.80
LiveCodeBench65.90
Free commercial
18
Claude Sonnet 4
MMLU Pro84.00
GPQA Diamond83.80
SWE-bench Verified80.20
MATH-5000.00
AIME 202443.40
LiveCodeBench66.00
不开源
19
GLM-4.5-Air
1060B
MMLU Pro81.40
GPQA Diamond75.00
SWE-bench Verified57.60
MATH-50098.10
AIME 202489.40
LiveCodeBench70.70
Free commercial
20
MiniMax-M1-80k
4560B
MMLU Pro81.10
GPQA Diamond70.00
SWE-bench Verified56.00
MATH-50096.80
AIME 202486.00
LiveCodeBench65.00
Free commercial
21
OpenAI o4 - mini
MMLU Pro80.60
GPQA Diamond81.40
SWE-bench Verified68.10
MATH-5000.00
AIME 202498.70
LiveCodeBench0.00
不开源
22
MiniMax-M1-40k
4560B
MMLU Pro80.60
GPQA Diamond69.20
SWE-bench Verified55.60
MATH-50096.00
AIME 202483.30
LiveCodeBench62.30
Free commercial
23
OpenAI o1-mini
MMLU Pro80.30
GPQA Diamond60.00
SWE-bench Verified0.00
MATH-50090.00
AIME 202463.60
LiveCodeBench52.00
不开源
24
Hunyuan-TurboS
MMLU Pro79.00
GPQA Diamond57.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench32.00
不开源
25
GPT OSS 120B
117B
MMLU Pro79.00
GPQA Diamond80.10
SWE-bench Verified60.10
MATH-5000.00
AIME 202496.60
LiveCodeBench0.00
Free commercial
26
QwQ-32B
325B
MMLU Pro76.00
GPQA Diamond58.00
SWE-bench Verified0.00
MATH-50091.00
AIME 202479.50
LiveCodeBench0.00
Free commercial
27
GPT OSS 20B
210B
MMLU Pro74.00
GPQA Diamond71.50
SWE-bench Verified34.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
Free commercial
28
Qwen3-235B-A22B
2350B
MMLU Pro72.90
GPQA Diamond71.10
SWE-bench Verified34.40
MATH-50098.00
AIME 202485.70
LiveCodeBench70.70
Free commercial
29
Qwen3-8B
80B
MMLU Pro72.50
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-50097.40
AIME 202479.40
LiveCodeBench61.80
Free commercial
30
QwQ-32B-Preview
320B
MMLU Pro70.97
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50090.60
AIME 202450.00
LiveCodeBench0.00
Free commercial
31
Qwen3-30B-A3B
305B
MMLU Pro69.10
GPQA Diamond54.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench29.00
Free commercial
32
OpenAI o3-mini
MMLU Pro0.00
GPQA Diamond70.60
SWE-bench Verified40.80
MATH-50095.80
AIME 202460.00
LiveCodeBench0.00
不开源
33
QwQ-Max-Preview
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.60
Free commercial
34
Kimi-k1.6-IOI
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench65.90
不开源
35
OpenAI o3-mini (medium)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench67.40
不开源
36
Kimi-k1.6-IOI-high
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench73.80
不开源
37
Gemini 2.5 Pro Deep Think
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench80.40
不开源
38
Kimi k1.5 (Short-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50094.60
AIME 20240.00
LiveCodeBench0.00
不开源
39
Kimi k1.5 (Long-CoT)
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50096.20
AIME 20240.00
LiveCodeBench0.00
不开源
40
Grok 3.5
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
41
Phi-4-instruct (reasoning-trained)
38B
MMLU Pro0.00
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-50090.40
AIME 202450.00
LiveCodeBench0.00
不开源
42
DeepSeek-R1-Distill-Qwen-7B
70B
MMLU Pro0.00
GPQA Diamond49.50
SWE-bench Verified0.00
MATH-50091.40
AIME 202453.30
LiveCodeBench0.00
Free commercial
43
DeepSeek-R1-Distill-Llama-70B
700B
MMLU Pro0.00
GPQA Diamond65.20
SWE-bench Verified0.00
MATH-50094.50
AIME 20240.00
LiveCodeBench0.00
Free commercial
44
Gemini 2.5 Flash-Lite
MMLU Pro0.00
GPQA Diamond66.70
SWE-bench Verified27.60
MATH-5000.00
AIME 20240.00
LiveCodeBench34.30
不开源
45
Magistral-Small-2506
240B
MMLU Pro0.00
GPQA Diamond68.18
SWE-bench Verified0.00
MATH-5000.00
AIME 202470.68
LiveCodeBench55.84
Free commercial
46
Qwen3-32B
320B
MMLU Pro0.00
GPQA Diamond68.40
SWE-bench Verified0.00
MATH-50097.20
AIME 202481.40
LiveCodeBench65.70
Free commercial
47
GPT-5.2 Pro
MMLU Pro0.00
GPQA Diamond93.20
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
48
Magistral-Medium-2506
MMLU Pro0.00
GPQA Diamond70.83
SWE-bench Verified0.00
MATH-5000.00
AIME 202473.59
LiveCodeBench59.36
不开源
49
GLM-4.7-Flash
310B
MMLU Pro0.00
GPQA Diamond75.20
SWE-bench Verified59.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
50
OpenAI o3-mini (high)
MMLU Pro0.00
GPQA Diamond79.70
SWE-bench Verified49.30
MATH-50097.90
AIME 202487.00
LiveCodeBench69.50
不开源
51
DeepSeek V3.2
6710B
MMLU Pro0.00
GPQA Diamond82.40
SWE-bench Verified73.10
MATH-5000.00
AIME 20240.00
LiveCodeBench83.30
Free commercial
52
Gemini 2.5 Flash
MMLU Pro0.00
GPQA Diamond82.80
SWE-bench Verified50.00
MATH-5000.00
AIME 202488.00
LiveCodeBench55.40
不开源
53
Gemini-2.5-Pro-Preview-05-06
MMLU Pro0.00
GPQA Diamond83.00
SWE-bench Verified63.20
MATH-50098.80
AIME 202492.00
LiveCodeBench77.10
不开源
54
o3-pro
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified75.00
MATH-5000.00
AIME 202493.00
LiveCodeBench0.00
不开源
55
Gemini 2.5 Pro Experimental 03-25
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified63.80
MATH-5000.00
AIME 202492.00
LiveCodeBench70.40
不开源
56
Grok-3 mini - Reasoning
MMLU Pro0.00
GPQA Diamond84.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
不开源
57
Grok-3 - Reasoning Beta
MMLU Pro0.00
GPQA Diamond84.60
SWE-bench Verified0.00
MATH-5000.00
AIME 202493.30
LiveCodeBench79.40
不开源
58
Claude Sonnet 3.7-64K Extended Thinking
MMLU Pro0.00
GPQA Diamond84.80
SWE-bench Verified0.00
MATH-50096.20
AIME 202480.00
LiveCodeBench0.00
不开源
59
GPT-5.1
MMLU Pro0.00
GPQA Diamond88.10
SWE-bench Verified76.30
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
60
GPT-5-Pro
MMLU Pro0.00
GPQA Diamond89.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源