DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tools

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款

LLM Benchmark Performance Comparison

Compare model performance across MMLU Pro, HLE, SWE-Bench and more. Select benchmarks to view rankings.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

More benchmark coverage

Browse the benchmark catalog by category and language

More Benchmarks

Filters

Active
All

LLM Performance Results

Data source: DataLearnerAI
No chart data available
RankModelMMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBenchParams (B)License
1Phi-4-mini-instruct (3.8B)52.8036.000.0071.8010.000.0038BFree commercial
2Qwen2.5-3B34.6024.300.00
3B and below
7B
13B
34B
65B
100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
0.00
0.00
0.00
30B
Free commercial
3Llama-3.2-3B25.0026.600.000.000.000.0032BFree commercial
4Phi-4-instruct (reasoning-trained)0.0049.000.0090.4050.000.0038B不开源
1
Phi-4-mini-instruct (3.8B)
38B
MMLU Pro52.80
GPQA Diamond36.00
SWE-bench Verified0.00
MATH-50071.80
AIME 202410.00
LiveCodeBench0.00
Free commercial
2
Qwen2.5-3B
30B
MMLU Pro34.60
GPQA Diamond24.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
3
Llama-3.2-3B
32B
MMLU Pro25.00
GPQA Diamond26.60
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
4
Phi-4-instruct (reasoning-trained)
38B
MMLU Pro0.00
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-50090.40
AIME 202450.00
LiveCodeBench0.00
不开源