加载中...

LLM Benchmark Performance Comparison

Quickly view LLM performance across benchmarks like MMLU Pro, HLE, SWE-Bench, and more. Compare models across general knowledge, coding, and reasoning capabilities. Customize your comparison by selecting specific models and benchmarks.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU Pro GPQA Diamond SWE-bench Verified MATH-500 AIME 2024 LiveCodeBench

Filters

Active

Filter by parameter size

All 3B and below 7B 13B 34B

LLM Performance Results

Data source: DataLearnerAI

LLM Performance Results

Data source: DataLearnerAI

LiveCodeBench

Rank	Model	MMLU Pro	GPQA Diamond	SWE-bench Verified	MATH-500	AIME 2024	LiveCodeBench	Params (B)	License
1	OpenAI o1	91.04	77.30	48.90	96.40	79.20	71.00	—	不开源
2	M2.1	88.00	81.00	74.00	0.00

OpenAI o1

MMLU Pro91.04

GPQA Diamond77.30

SWE-bench Verified48.90

MATH-50096.40

AIME 202479.20

LiveCodeBench71.00

不开源

M2.1

2300B

MMLU Pro88.00

GPQA Diamond81.00

SWE-bench Verified74.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

GPT-4.5

MMLU Pro86.10

GPQA Diamond71.40

SWE-bench Verified38.00

MATH-50090.70

AIME 202436.70

LiveCodeBench46.40

不开源

Qwen3-Max-Thinking

10000B

MMLU Pro85.70

GPQA Diamond87.40

SWE-bench Verified75.30

MATH-5000.00

AIME 20240.00

LiveCodeBench85.90

不开源

DeepSeek-R1-0528

6710B

MMLU Pro85.00

GPQA Diamond81.00

SWE-bench Verified57.60

MATH-50098.00

AIME 202491.40

LiveCodeBench73.30

Free commercial

DeepSeek-V3.1

6710B

MMLU Pro85.00

GPQA Diamond80.10

SWE-bench Verified66.00

MATH-5000.00

AIME 202493.10

LiveCodeBench74.80

Free commercial

DeepSeek-V3.1 Terminus

6710B

MMLU Pro85.00

GPQA Diamond80.70

SWE-bench Verified68.40

MATH-5000.00

AIME 20240.00

LiveCodeBench80.00

Free commercial

DeepSeek V3.2-Exp

6710B

MMLU Pro85.00

GPQA Diamond79.90

SWE-bench Verified67.80

MATH-5000.00

AIME 20240.00

LiveCodeBench74.10

Free commercial

Claude Opus 4

MMLU Pro85.00

GPQA Diamond79.60

SWE-bench Verified72.50

MATH-50098.20

AIME 202476.00

LiveCodeBench56.60

不开源

GLM-4.5

3550B

MMLU Pro84.60

GPQA Diamond79.10

SWE-bench Verified64.20

MATH-50098.20

AIME 202491.00

LiveCodeBench72.90

Free commercial

Kimi K2 Thinking

10400B

MMLU Pro84.60

GPQA Diamond84.50

SWE-bench Verified71.30

MATH-5000.00

AIME 20240.00

LiveCodeBench83.10

Free commercial

Qwen3-235B-A22B-Thinking-2507

2350B

MMLU Pro84.40

GPQA Diamond81.10

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench74.10

Free commercial

GLM-4.7

3580B

MMLU Pro84.30

GPQA Diamond85.70

SWE-bench Verified73.80

MATH-5000.00

AIME 20240.00

LiveCodeBench84.90

Free commercial

DeepSeek-R1

6710B

MMLU Pro84.00

GPQA Diamond71.50

SWE-bench Verified49.20

MATH-50097.30

AIME 202479.80

LiveCodeBench65.90

Free commercial

Intern-S1

2410B

MMLU Pro83.50

GPQA Diamond77.30

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Qwen3-235B-A22B-2507

2350B

MMLU Pro83.00

GPQA Diamond77.50

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench51.80

Free commercial

GLM-4.6

3550B

MMLU Pro83.00

GPQA Diamond82.90

SWE-bench Verified68.00

MATH-5000.00

AIME 20240.00

LiveCodeBench84.50

Free commercial

Llama 4 Behemoth Instruct

20000B

MMLU Pro82.20

GPQA Diamond73.70

SWE-bench Verified0.00

MATH-50095.00

AIME 20240.00

LiveCodeBench49.40

Free commercial

MiniMax M2

2300B

MMLU Pro82.00

GPQA Diamond78.00

SWE-bench Verified69.40

MATH-5000.00

AIME 20240.00

LiveCodeBench83.00

Free commercial

GLM-4.5-Air

1060B

MMLU Pro81.40

GPQA Diamond75.00

SWE-bench Verified57.60

MATH-50098.10

AIME 202489.40

LiveCodeBench70.70

Free commercial

DeepSeek-V3-0324

6710B

MMLU Pro81.20

GPQA Diamond68.40

SWE-bench Verified38.80

MATH-50094.00

AIME 202459.40

LiveCodeBench49.20

Free commercial

Kimi K2

10000B

MMLU Pro81.10

GPQA Diamond75.10

SWE-bench Verified51.80

MATH-50097.40

AIME 202469.60

LiveCodeBench53.70

Free commercial

MiniMax-M1-80k

4560B

MMLU Pro81.10

GPQA Diamond70.00

SWE-bench Verified56.00

MATH-50096.80

AIME 202486.00

LiveCodeBench65.00

Free commercial

OpenAI o4 - mini

MMLU Pro80.60

GPQA Diamond81.40

SWE-bench Verified68.10

MATH-5000.00

AIME 202498.70

LiveCodeBench0.00

不开源

MiniMax-M1-40k

4560B

MMLU Pro80.60

GPQA Diamond69.20

SWE-bench Verified55.60

MATH-50096.00

AIME 202483.30

LiveCodeBench62.30

Free commercial

Llama 4 Maverick Instruct

4000B

MMLU Pro80.50

GPQA Diamond69.80

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench43.40

Free commercial

GPT-4.1

MMLU Pro80.50

GPQA Diamond66.30

SWE-bench Verified54.60

MATH-50092.80

AIME 202448.10

LiveCodeBench40.50

不开源

OpenAI o1-mini

MMLU Pro80.30

GPQA Diamond60.00

SWE-bench Verified0.00

MATH-50090.00

AIME 202463.60

LiveCodeBench52.00

不开源

Gemini 2.0 Pro Experimental

MMLU Pro79.10

GPQA Diamond64.70

SWE-bench Verified0.00

MATH-5000.00

AIME 202436.00

LiveCodeBench0.00

不开源

Hunyuan-TurboS

MMLU Pro79.00

GPQA Diamond57.50

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench32.00

不开源

Kimi K2.5

10000B

MMLU Pro78.50

GPQA Diamond87.60

SWE-bench Verified76.80

MATH-5000.00

AIME 20240.00

LiveCodeBench85.00

Free commercial

ERNIE-4.5-300B-A47B

3000B

MMLU Pro78.40

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-50096.40

AIME 202454.80

LiveCodeBench38.80

Free commercial

GPT-4o(2024-11-20)

MMLU Pro77.90

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

不开源

Claude 3.5 Sonnet

MMLU Pro77.64

GPQA Diamond59.40

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

不开源

Gemini 2.0 Flash Experimental

MMLU Pro76.24

GPQA Diamond65.20

SWE-bench Verified21.40

MATH-5000.00

AIME 20240.00

LiveCodeBench29.10

不开源

Qwen2.5-Max

MMLU Pro76.10

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

不开源

DeepSeek-V3

6810B

MMLU Pro75.90

GPQA Diamond59.10

SWE-bench Verified0.00

MATH-50087.80

AIME 202439.00

LiveCodeBench34.60

Free commercial

Grok 2

2690B

MMLU Pro75.50

GPQA Diamond56.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Llama 4 Scout Instruct

1090B

MMLU Pro74.30

GPQA Diamond57.20

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench32.80

Free commercial

Llama3.1-405B Instruct

4050B

MMLU Pro73.40

GPQA Diamond49.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench30.20

Free commercial

Qwen3-235B-A22B

2350B

MMLU Pro72.90

GPQA Diamond71.10

SWE-bench Verified34.40

MATH-50098.00

AIME 202485.70

LiveCodeBench70.70

Free commercial

Gemini 2.0 Flash-Lite

MMLU Pro71.60

GPQA Diamond51.50

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench28.90

不开源

Llama 4 Maverick

4000B

MMLU Pro62.90

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Llama3.1-405B

4050B

MMLU Pro61.60

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Llama 4 Scout

1090B

MMLU Pro58.20

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Mixtral-8x22B-Instruct-v0.1

1410B

MMLU Pro56.33

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Grok-1.5

MMLU Pro51.00

GPQA Diamond35.90

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

不开源

Grok 3 mini

MMLU Pro0.00

GPQA Diamond65.00

SWE-bench Verified0.00

MATH-5000.00

AIME 202440.00

LiveCodeBench0.00

不开源

Codestral 25.01

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench37.90

不开源

GPT-4.1 mini

MMLU Pro0.00

GPQA Diamond65.00

SWE-bench Verified23.60

MATH-5000.00

AIME 202449.60

LiveCodeBench0.00

不开源

GPT-4.1 nano

MMLU Pro0.00

GPQA Diamond50.30

SWE-bench Verified0.00

MATH-5000.00

AIME 202429.40

LiveCodeBench0.00

不开源

Grok 3.5

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

不开源

Step 3.5 Flash

1960B

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified74.40

MATH-5000.00

AIME 20240.00

LiveCodeBench86.40

Free commercial

Kimi K2 0905

10000B

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified69.20

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Qwen3-Coder-480B-A35B

4800B

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified67.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Kimi k1.5 (Long-CoT)

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-50096.20

AIME 20240.00

LiveCodeBench0.00

不开源

Kimi k1.5 (Short-CoT)

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-50094.60

AIME 20240.00

LiveCodeBench0.00

不开源

Gemini 2.5 Pro Deep Think

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench80.40

不开源

Kimi-k1.6-IOI-high

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench73.80

不开源

OpenAI o3-mini (medium)

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench67.40

不开源

Kimi-k1.6-IOI

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench65.90

不开源

QwQ-Max-Preview

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench65.60

Free commercial

Gemini 2.5 Flash-Lite

MMLU Pro0.00

GPQA Diamond66.70

SWE-bench Verified27.60

MATH-5000.00

AIME 20240.00

LiveCodeBench34.30

不开源

Claude Sonnet 3.7

MMLU Pro0.00

GPQA Diamond68.00

SWE-bench Verified70.30

MATH-50082.20

AIME 202423.30

LiveCodeBench0.00

不开源

Magistral-Medium-2506

MMLU Pro0.00

GPQA Diamond70.83

SWE-bench Verified0.00

MATH-5000.00

AIME 202473.59

LiveCodeBench59.36

不开源

Step3

3210B

MMLU Pro0.00

GPQA Diamond73.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench67.10

Free commercial

ERNIE-4.5-VL-424B-A47B-Base

4240B

MMLU Pro0.00

GPQA Diamond76.80

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench38.80

Free commercial

OpenAI o3-mini (high)

MMLU Pro0.00

GPQA Diamond79.70

SWE-bench Verified49.30

MATH-50097.90

AIME 202487.00

LiveCodeBench69.50

不开源

Grok 3

MMLU Pro0.00

GPQA Diamond80.40

SWE-bench Verified0.00

MATH-5000.00

AIME 202484.20

LiveCodeBench70.60

不开源

DeepSeek V3.2

6710B

MMLU Pro0.00

GPQA Diamond82.40

SWE-bench Verified73.10

MATH-5000.00

AIME 20240.00

LiveCodeBench83.30

Free commercial

Gemini 2.5 Flash

MMLU Pro0.00

GPQA Diamond82.80

SWE-bench Verified50.00

MATH-5000.00

AIME 202488.00

LiveCodeBench55.40

不开源

Gemini-2.5-Pro-Preview-05-06

MMLU Pro0.00

GPQA Diamond83.00

SWE-bench Verified63.20

MATH-50098.80

AIME 202492.00

LiveCodeBench77.10

不开源

o3-pro

MMLU Pro0.00

GPQA Diamond84.00

SWE-bench Verified75.00

MATH-5000.00

AIME 202493.00

LiveCodeBench0.00

不开源

Grok-3 mini - Reasoning

MMLU Pro0.00

GPQA Diamond84.00

SWE-bench Verified0.00

MATH-5000.00

AIME 202496.00

LiveCodeBench0.00

不开源

Grok-3 - Reasoning Beta

MMLU Pro0.00

GPQA Diamond84.60

SWE-bench Verified0.00

MATH-5000.00

AIME 202493.30

LiveCodeBench79.40

不开源

Claude Sonnet 3.7-64K Extended Thinking

MMLU Pro0.00

GPQA Diamond84.80

SWE-bench Verified0.00

MATH-50096.20

AIME 202480.00

LiveCodeBench0.00

不开源

Amazon Nova Pro

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

不开源