DataLearner logoDataLearnerAI
AI Tech Blogs
Leaderboards
Benchmarks
Models
Resources
Tool Directory

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

产品

  • Leaderboards
  • 模型对比
  • Datasets

资源

  • Tutorials
  • Editorial
  • Tool directory

关于

  • 关于我们
  • 隐私政策
  • 数据收集方法
  • 联系我们

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

隐私政策服务条款
Back to Main Leaderboard

大模型代码编程能力评测排行榜

本页面提供大模型代码编程能力评测排行榜,涵盖 SWE-Bench、LiveCodeBench、HumanEval 等数据集,对 GPT、Claude、Qwen、DeepSeek 等模型进行对比。

Updated on: 2025/10/12 20:54:51

Benchmark switcher

Pick the leaderboard to sync both chart and table

SWE-bench VerifiedLiveCodeBenchHumanEval

More benchmark coverage

Browse the benchmark catalog by category and language

More Benchmarks

Filters

Active
All3B and below7B13B34B65B100B and above
AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI
RankModelSWE-bench VerifiedLiveCodeBenchHumanEvalParams (B)License
1Phi-4-mini-instruct (3.8B)0.000.0074.4038BFree commercial
2Qwen2.5-3B0.000.0042.1030BFree commercial
3Llama-3.2-3B0.000.0028.0032BFree commercial
1
Phi-4-mini-instruct (3.8B)
38B
SWE-bench Verified0.00
LiveCodeBench0.00
HumanEval74.40
Free commercial
2
Qwen2.5-3B
30B
SWE-bench Verified0.00
LiveCodeBench0.00
HumanEval42.10
Free commercial
3
Llama-3.2-3B
32B
SWE-bench Verified0.00
LiveCodeBench0.00
HumanEval28.00
Free commercial