加载中...

LLM Benchmark Performance Comparison

Quickly view LLM performance across benchmarks like MMLU Pro, HLE, SWE-Bench, and more. Compare models across general knowledge, coding, and reasoning capabilities. Customize your comparison by selecting specific models and benchmarks.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU Pro GPQA Diamond SWE-bench Verified MATH-500 AIME 2024 LiveCodeBench

Filters

Active

Filter by parameter size

All 3B and below 7B 13B 34B

LLM Performance Results

Data source: DataLearnerAI

LLM Performance Results

Data source: DataLearnerAI

LiveCodeBench

Rank	Model	MMLU Pro	GPQA Diamond	SWE-bench Verified	MATH-500	AIME 2024	LiveCodeBench	Params (B)	License
1	Qwen3-235B-A22B-Thinking	84.40	81.10	0.00	0.00	0.00	74.10	305B	Free commercial
2	Qwen3-30B-A3B-2507	78.40	70.40	22.00

Qwen3-235B-A22B-Thinking

305B

MMLU Pro84.40

GPQA Diamond81.10

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench74.10

Free commercial

Qwen3-30B-A3B-2507

305B

MMLU Pro78.40

GPQA Diamond70.40

SWE-bench Verified22.00

MATH-5000.00

AIME 20240.00

LiveCodeBench43.20

Free commercial

QwQ-32B

325B

MMLU Pro76.00

GPQA Diamond58.00

SWE-bench Verified0.00

MATH-50091.00

AIME 202479.50

LiveCodeBench0.00

Free commercial

GPT OSS 20B

210B

MMLU Pro74.00

GPQA Diamond71.50

SWE-bench Verified34.00

MATH-5000.00

AIME 202496.00

LiveCodeBench0.00

Free commercial

QwQ-32B-Preview

320B

MMLU Pro70.97

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-50090.60

AIME 202450.00

LiveCodeBench0.00

Free commercial

Qwen2.5-32B

320B

MMLU Pro69.23

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench51.20

Free commercial

Qwen3-30B-A3B

305B

MMLU Pro69.10

GPQA Diamond54.80

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench29.00

Free commercial

Mistral-Small-3.2

240B

MMLU Pro69.06

GPQA Diamond46.13

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Gemma 3 - 27B (IT)

270B

MMLU Pro67.50

GPQA Diamond42.40

SWE-bench Verified0.00

MATH-5000.00

AIME 202425.30

LiveCodeBench29.70

Free commercial

Mistral-Small-3.1-24B-Instruct-2503

240B

MMLU Pro66.76

GPQA Diamond45.96

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Gemma2-27B

270B

MMLU Pro56.54

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

C4AI Aya Vision 32B

320B

MMLU Pro47.16

GPQA Diamond33.84

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Non-commercial

GLM-4.7-Flash

310B

MMLU Pro0.00

GPQA Diamond75.20

SWE-bench Verified59.20

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Qwen3-32B

320B

MMLU Pro0.00

GPQA Diamond68.40

SWE-bench Verified0.00

MATH-50097.20

AIME 202481.40

LiveCodeBench65.70

Free commercial

Magistral-Small-2506

240B

MMLU Pro0.00

GPQA Diamond68.18

SWE-bench Verified0.00

MATH-5000.00

AIME 202470.68

LiveCodeBench55.84

Free commercial

Devstral Small 1.1

240B

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified53.60

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Qwen3-Coder-Flash

305B

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified51.60

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Devstral Small 1.0

240B

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified46.80

MATH-5000.00

AIME 20240.00

LiveCodeBench0.00

Free commercial

Codestral

220B

MMLU Pro0.00

GPQA Diamond0.00

SWE-bench Verified0.00

MATH-5000.00

AIME 20240.00

LiveCodeBench31.50

Non-commercial