DataLearner logoDataLearnerAI
AI Tech Blogs
Leaderboards
Benchmarks
Models
Resources
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款

LLM Benchmark Performance Comparison

Quickly view LLM performance across benchmarks like MMLU Pro, HLE, SWE-Bench, and more. Compare models across general knowledge, coding, and reasoning capabilities. Customize your comparison by selecting specific models and benchmarks.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

Filters

Active
All3B and below7B13B34B65B100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI

LLM Performance Results

Data source: DataLearnerAI

AIME 2024
RankModelMMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBenchParams (B)License
1GPT-5-mini78.0069.000.000.000.0055.00—不开源
2Gemini 1.5 Pro76.1053.500.000.000.000.00—不开源
3Llama3.1-405B Instruct73.4049.000.000.000.0030.204050BFree commercial
4Phi 4 - 14B70.400.000.000.000.000.00140BNon-commercial
5Qwen2.5-32B69.230.000.000.000.0051.20320BFree commercial
6Hunyuan-A13B-Instruct67.2371.200.000.0087.3063.90800BFree commercial
7Mistral-Small-3.1-24B-Instruct-250366.7645.960.000.000.000.00240BFree commercial
8Llama3.1-70B-Instruct66.4048.000.000.000.0033.30700BFree commercial
9Claude 3.5 Haiku65.0041.600.000.000.000.00—不开源
10Qwen2.5-14B63.690.000.000.000.000.00140BFree commercial
11GPT-4o mini61.7041.100.000.000.000.00—不开源
12Llama3.1-405B61.600.000.000.000.000.004050BFree commercial
13Gemma 3 - 12B (IT)60.6040.900.000.000.0024.60120BFree commercial
14Qwen2.5-72B58.1045.900.000.000.000.00727BFree commercial
15Gemma2-27B56.540.000.000.000.000.00270BFree commercial
16Llama3.1-70B52.470.000.000.000.000.00700BFree commercial
17Qwen2.5-7B45.0036.400.000.000.000.0070BFree commercial
18Gemma 2 - 9B44.7032.800.000.000.000.0090BFree commercial
19Llama3.1-8B35.4025.800.000.000.000.0080BFree commercial
20Qwen2.5-3B34.6024.300.000.000.000.0030BFree commercial
21Llama-3.2-3B25.0026.600.000.000.000.0032BFree commercial
22GPT-50.0087.3072.800.000.000.00—不开源
23Grok 3 mini0.0065.000.000.0040.000.00—不开源
1
GPT-5-mini
MMLU Pro78.00
GPQA Diamond69.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench55.00
不开源
2
Gemini 1.5 Pro
MMLU Pro76.10
GPQA Diamond53.50
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
3
Llama3.1-405B Instruct
4050B
MMLU Pro73.40
GPQA Diamond49.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench30.20
Free commercial
4
Phi 4 - 14B
140B
MMLU Pro70.40
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Non-commercial
5
Qwen2.5-32B
320B
MMLU Pro69.23
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.20
Free commercial
6
Hunyuan-A13B-Instruct
800B
MMLU Pro67.23
GPQA Diamond71.20
SWE-bench Verified0.00
MATH-5000.00
AIME 202487.30
LiveCodeBench63.90
Free commercial
7
Mistral-Small-3.1-24B-Instruct-2503
240B
MMLU Pro66.76
GPQA Diamond45.96
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
8
Llama3.1-70B-Instruct
700B
MMLU Pro66.40
GPQA Diamond48.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench33.30
Free commercial
9
Claude 3.5 Haiku
MMLU Pro65.00
GPQA Diamond41.60
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
10
Qwen2.5-14B
140B
MMLU Pro63.69
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
11
GPT-4o mini
MMLU Pro61.70
GPQA Diamond41.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
12
Llama3.1-405B
4050B
MMLU Pro61.60
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
13
Gemma 3 - 12B (IT)
120B
MMLU Pro60.60
GPQA Diamond40.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench24.60
Free commercial
14
Qwen2.5-72B
727B
MMLU Pro58.10
GPQA Diamond45.90
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
15
Gemma2-27B
270B
MMLU Pro56.54
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
16
Llama3.1-70B
700B
MMLU Pro52.47
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
17
Qwen2.5-7B
70B
MMLU Pro45.00
GPQA Diamond36.40
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
18
Gemma 2 - 9B
90B
MMLU Pro44.70
GPQA Diamond32.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
19
Llama3.1-8B
80B
MMLU Pro35.40
GPQA Diamond25.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
20
Qwen2.5-3B
30B
MMLU Pro34.60
GPQA Diamond24.30
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
21
Llama-3.2-3B
32B
MMLU Pro25.00
GPQA Diamond26.60
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
22
GPT-5
MMLU Pro0.00
GPQA Diamond87.30
SWE-bench Verified72.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
不开源
23
Grok 3 mini
MMLU Pro0.00
GPQA Diamond65.00
SWE-bench Verified0.00
MATH-5000.00
AIME 202440.00
LiveCodeBench0.00
不开源