DataLearner logoDataLearnerAI
AI Tech Blogs
Leaderboards
Benchmarks
Models
Resources
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款

LLM Benchmark Performance Comparison

Quickly view LLM performance across benchmarks like MMLU Pro, HLE, SWE-Bench, and more. Compare models across general knowledge, coding, and reasoning capabilities. Customize your comparison by selecting specific models and benchmarks.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

Filters

Active
All3B and below7B13B34B

LLM Performance Results

Data source: DataLearnerAI

LLM Performance Results

Data source: DataLearnerAI

MMLU Pro
RankModelMMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBenchParams (B)License
1Pangu Pro MoE82.6073.700.0096.8079.2059.60719BFree commercial
2Llama3.3-70B-Instruct68.9050.500.00
65B
100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
0.00
0.00
33.30
700B
Free commercial
3Hunyuan-A13B-Instruct67.2371.200.000.0087.3063.90800BFree commercial
4Llama3.1-70B-Instruct66.4048.000.000.000.0033.30700BFree commercial
5Qwen3-Next66.050.000.000.000.0056.60800BFree commercial
6Qwen2.5-72B58.1045.900.000.000.000.00727BFree commercial
7Llama3-70B-Instruct56.200.000.000.000.000.00700BFree commercial
8Llama3-70B52.780.000.000.000.000.00700BFree commercial
9Llama3.1-70B52.470.000.000.000.000.00700BFree commercial
10DeepSeek-R1-Distill-Llama-70B0.0065.200.0094.500.000.00700BFree commercial
1
Pangu Pro MoE
719B
MMLU Pro82.60
GPQA Diamond73.70
SWE-bench Verified0.00
MATH-50096.80
AIME 202479.20
LiveCodeBench59.60
Free commercial
2
Llama3.3-70B-Instruct
700B
MMLU Pro68.90
GPQA Diamond50.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
Free commercial
3
Hunyuan-A13B-Instruct
800B
MMLU Pro67.23
GPQA Diamond71.20
SWE-bench Verified0.00
MATH-5000.00
AIME 202487.30
LiveCodeBench63.90
Free commercial
4
Llama3.1-70B-Instruct
700B
MMLU Pro66.40
GPQA Diamond48.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
Free commercial
5
Qwen3-Next
800B
MMLU Pro66.05
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench56.60
Free commercial
6
Qwen2.5-72B
727B
MMLU Pro58.10
GPQA Diamond45.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
7
Llama3-70B-Instruct
700B
MMLU Pro56.20
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
8
Llama3-70B
700B
MMLU Pro52.78
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
9
Llama3.1-70B
700B
MMLU Pro52.47
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
10
DeepSeek-R1-Distill-Llama-70B
700B
MMLU Pro0.00
GPQA Diamond65.20
SWE-bench Verified0.00
MATH-50094.50
AIME 20240.00
LiveCodeBench0.00
Free commercial