DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tools

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款

LLM Benchmark Performance Comparison

Compare model performance across MMLU Pro, HLE, SWE-Bench and more. Select benchmarks to view rankings.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

More benchmark coverage

Browse the benchmark catalog by category and language

More Benchmarks

Filters

Active
All

LLM Performance Results

Data source: DataLearnerAI
RankModelMMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBenchParams (B)License
1Pangu Pro MoE82.6073.700.0096.8079.2059.60719BFree commercial
2Llama3.3-70B-Instruct68.9050.500.00
3B and below
7B
13B
34B
65B
100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
0.00
0.00
33.30
700B
Free commercial
3Hunyuan-A13B-Instruct67.2371.200.000.0087.3063.90800BFree commercial
4Llama3.1-70B-Instruct66.4048.000.000.000.0033.30700BFree commercial
5Qwen3-Next66.050.000.000.000.0056.60800BFree commercial
6Qwen2.5-72B58.1045.900.000.000.000.00727BFree commercial
7Llama3-70B-Instruct56.200.000.000.000.000.00700BFree commercial
8Llama3-70B52.780.000.000.000.000.00700BFree commercial
9Llama3.1-70B52.470.000.000.000.000.00700BFree commercial
10DeepSeek-R1-Distill-Llama-70B0.0065.200.0094.500.000.00700BFree commercial
1
Pangu Pro MoE
719B
MMLU Pro82.60
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50096.80
AIME 202479.20
LiveCodeBench59.60
Free commercial
2
Llama3.3-70B-Instruct
700B
MMLU Pro68.90
GPQA Diamond50.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
Free commercial
3
Hunyuan-A13B-Instruct
800B
MMLU Pro67.23
GPQA Diamond71.20
SWE-bench Verified0.00
MATH-5000.00
AIME 202487.30
LiveCodeBench63.90
Free commercial
4
Llama3.1-70B-Instruct
700B
MMLU Pro66.40
GPQA Diamond48.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
Free commercial
5
Qwen3-Next
800B
MMLU Pro66.05
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench56.60
Free commercial
6
Qwen2.5-72B
727B
MMLU Pro58.10
GPQA Diamond45.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
7
Llama3-70B-Instruct
700B
MMLU Pro56.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
8
Llama3-70B
700B
MMLU Pro52.78
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
9
Llama3.1-70B
700B
MMLU Pro52.47
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
10
DeepSeek-R1-Distill-Llama-70B
700B
MMLU Pro0.00
GPQA Diamond65.20
SWE-bench Verified0.00
MATH-50094.50
AIME 20240.00
LiveCodeBench0.00
Free commercial