DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tools

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款

LLM Benchmark Performance Comparison

Compare model performance across MMLU Pro, HLE, SWE-Bench and more. Select benchmarks to view rankings.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

More benchmark coverage

Browse the benchmark catalog by category and language

More Benchmarks

Filters

Active
All

LLM Performance Results

Data source: DataLearnerAI
RankModelMMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBenchParams (B)License
1Pangu Embedded79.000.000.0092.4081.9067.1070BFree commercial
2Qwen3-8B72.5062.000.00
3B and below
7B
13B
34B
65B
100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
97.40
79.40
61.80
80B
Free commercial
3GLM-4-9B-Chat72.400.000.000.0076.4051.8090BFree commercial
4Qwen2.5-7B45.0036.400.000.000.000.0070BFree commercial
5Gemma 2 - 9B44.7032.800.000.000.000.0090BFree commercial
6Llama3.1-8B-Instruct44.0026.300.000.000.000.0080BFree commercial
7Llama3.1-8B35.4025.800.000.000.000.0080BFree commercial
8Mistral-7B-Instruct-v0.330.9024.700.000.000.000.0070BFree commercial
9Qwen3-4B-Thinking-25070.0065.800.000.000.0055.2040BFree commercial
10Qwen3-4B-25070.0062.000.000.000.0035.1040BFree commercial
11Hunyuan-7B0.0060.100.0093.7081.1057.0070BFree commercial
12DeepSeek-R1-Distill-Qwen-7B0.0049.500.0091.4053.300.0070BFree commercial
13Qwen3-Coder-Next0.000.0070.600.000.000.0080BFree commercial
1
Pangu Embedded
70B
MMLU Pro79.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50092.40
AIME 202481.90
LiveCodeBench67.10
Free commercial
2
Qwen3-8B
80B
MMLU Pro72.50
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-50097.40
AIME 202479.40
LiveCodeBench61.80
Free commercial
3
GLM-4-9B-Chat
90B
MMLU Pro72.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202476.40
LiveCodeBench51.80
Free commercial
4
Qwen2.5-7B
70B
MMLU Pro45.00
GPQA Diamond36.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
5
Gemma 2 - 9B
90B
MMLU Pro44.70
GPQA Diamond32.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
6
Llama3.1-8B-Instruct
80B
MMLU Pro44.00
GPQA Diamond26.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
7
Llama3.1-8B
80B
MMLU Pro35.40
GPQA Diamond25.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
8
Mistral-7B-Instruct-v0.3
70B
MMLU Pro30.90
GPQA Diamond24.70
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
9
Qwen3-4B-Thinking-2507
40B
MMLU Pro0.00
GPQA Diamond65.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.20
Free commercial
10
Qwen3-4B-2507
40B
MMLU Pro0.00
GPQA Diamond62.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench35.10
Free commercial
11
Hunyuan-7B
70B
MMLU Pro0.00
GPQA Diamond60.10
SWE-bench Verified0.00
MATH-50093.70
AIME 202481.10
LiveCodeBench57.00
Free commercial
12
DeepSeek-R1-Distill-Qwen-7B
70B
MMLU Pro0.00
GPQA Diamond49.50
SWE-bench Verified0.00
MATH-50091.40
AIME 202453.30
LiveCodeBench0.00
Free commercial
13
Qwen3-Coder-Next
80B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified70.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial