加载中...

LLM Benchmark Performance Comparison

Quickly view LLM performance across benchmarks like MMLU Pro, HLE, SWE-Bench, and more. Compare models across general knowledge, coding, and reasoning capabilities. Customize your comparison by selecting specific models and benchmarks.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU Pro GPQA Diamond SWE-bench Verified MATH-500 AIME 2024 LiveCodeBench

Filters

Active

Filter by parameter size

All 3B and below 7B 13B 34B 65B

LLM Performance Results

Data source: DataLearnerAI

LLM Performance Results

Data source: DataLearnerAI

LiveCodeBench

Rank	Model	MMLU Pro	GPQA Diamond	SWE-bench Verified	MATH-500	AIME 2024	LiveCodeBench	Params (B)	License
1	Pangu Embedded	79.00	0.00	0.00	92.40	81.90	67.10	70B	Free commercial
2	Qwen3-8B	72.50	62.00	0.00