DataLearner logoDataLearnerAI
AI Tech Blogs
Leaderboards
Benchmarks
Models
Resources
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款

LLM Benchmark Performance Comparison

Quickly view LLM performance across benchmarks like MMLU Pro, HLE, SWE-Bench, and more. Compare models across general knowledge, coding, and reasoning capabilities. Customize your comparison by selecting specific models and benchmarks.

Detailed benchmark descriptions available at:LLM Benchmark List & Guide

Updated on: 2025/11/08 22:10:24

Benchmark switcher

Pick the leaderboard to sync both chart and table

MMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBench

Filters

Active
All3B and below7B13B34B65B100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI

LLM Performance Results

Data source: DataLearnerAI

MATH-500
RankModelMMLU ProGPQA DiamondSWE-bench VerifiedMATH-500AIME 2024LiveCodeBenchParams (B)License
1Qwen3-235B-A22B-Thinking84.4081.100.000.000.0074.10305BFree commercial
2Qwen3-30B-A3B-250778.4070.4022.000.000.0043.20305BFree commercial
3QwQ-32B76.0058.000.0091.0079.500.00325BFree commercial
4GPT OSS 20B74.0071.5034.000.0096.000.00210BFree commercial
5QwQ-32B-Preview70.970.000.0090.6050.000.00320BFree commercial
6Qwen2.5-32B69.230.000.000.000.0051.20320BFree commercial
7Qwen3-30B-A3B69.1054.800.000.000.0029.00305BFree commercial
8Mistral-Small-3.269.0646.130.000.000.000.00240BFree commercial
9Gemma 3 - 27B (IT)67.5042.400.000.0025.3029.70270BFree commercial
10Mistral-Small-3.1-24B-Instruct-250366.7645.960.000.000.000.00240BFree commercial
11Gemma2-27B56.540.000.000.000.000.00270BFree commercial
12C4AI Aya Vision 32B47.1633.840.000.000.000.00320BNon-commercial
13GLM-4.7-Flash0.0075.2059.200.000.000.00310BFree commercial
14Qwen3-32B0.0068.400.0097.2081.4065.70320BFree commercial
15Magistral-Small-25060.0068.180.000.0070.6855.84240BFree commercial
16Devstral Small 1.10.000.0053.600.000.000.00240BFree commercial
17Qwen3-Coder-Flash0.000.0051.600.000.000.00305BFree commercial
18Devstral Small 1.00.000.0046.800.000.000.00240BFree commercial
19Codestral0.000.000.000.000.0031.50220BNon-commercial
1
Qwen3-235B-A22B-Thinking
305B
MMLU Pro84.40
GPQA Diamond81.10
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench74.10
Free commercial
2
Qwen3-30B-A3B-2507
305B
MMLU Pro78.40
GPQA Diamond70.40
SWE-bench Verified22.00
MATH-5000.00
AIME 20240.00
LiveCodeBench43.20
Free commercial
3
QwQ-32B
325B
MMLU Pro76.00
GPQA Diamond58.00
SWE-bench Verified0.00
MATH-50091.00
AIME 202479.50
LiveCodeBench0.00
Free commercial
4
GPT OSS 20B
210B
MMLU Pro74.00
GPQA Diamond71.50
SWE-bench Verified34.00
MATH-5000.00
AIME 202496.00
LiveCodeBench0.00
Free commercial
5
QwQ-32B-Preview
320B
MMLU Pro70.97
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-50090.60
AIME 202450.00
LiveCodeBench0.00
Free commercial
6
Qwen2.5-32B
320B
MMLU Pro69.23
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench51.20
Free commercial
7
Qwen3-30B-A3B
305B
MMLU Pro69.10
GPQA Diamond54.80
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench29.00
Free commercial
8
Mistral-Small-3.2
240B
MMLU Pro69.06
GPQA Diamond46.13
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
9
Gemma 3 - 27B (IT)
270B
MMLU Pro67.50
GPQA Diamond42.40
SWE-bench Verified0.00
MATH-5000.00
AIME 202425.30
LiveCodeBench29.70
Free commercial
10
Mistral-Small-3.1-24B-Instruct-2503
240B
MMLU Pro66.76
GPQA Diamond45.96
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
11
Gemma2-27B
270B
MMLU Pro56.54
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
12
C4AI Aya Vision 32B
320B
MMLU Pro47.16
GPQA Diamond33.84
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Non-commercial
13
GLM-4.7-Flash
310B
MMLU Pro0.00
GPQA Diamond75.20
SWE-bench Verified59.20
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
14
Qwen3-32B
320B
MMLU Pro0.00
GPQA Diamond68.40
SWE-bench Verified0.00
MATH-50097.20
AIME 202481.40
LiveCodeBench65.70
Free commercial
15
Magistral-Small-2506
240B
MMLU Pro0.00
GPQA Diamond68.18
SWE-bench Verified0.00
MATH-5000.00
AIME 202470.68
LiveCodeBench55.84
Free commercial
16
Devstral Small 1.1
240B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified53.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
17
Qwen3-Coder-Flash
305B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified51.60
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
18
Devstral Small 1.0
240B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified46.80
MATH-5000.00
AIME 20240.00
LiveCodeBench0.00
Free commercial
19
Codestral
220B
MMLU Pro0.00
GPQA Diamond0.00
SWE-bench Verified0.00
MATH-5000.00
AIME 20240.00
LiveCodeBench31.50
Non-commercial