DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

大模型代码编程能力评测排行榜

本页面提供大模型代码编程能力评测排行榜,涵盖 SWE-Bench、LiveCodeBench、HumanEval 等数据集,对 GPT、Claude、Qwen、DeepSeek 等模型进行对比。

Updated on: 2025/10/12 20:54:51
SWE-bench VerifiedLiveCodeBenchHumanEval
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI
RankModelSWE-bench VerifiedLiveCodeBenchHumanEvalParams (B)License
1Claude Sonnet 4.582.000.000.00—不开源
2Claude Sonnet 5
82.00
0.00
0.00
—
不开源
3Claude Opus 4.580.900.000.00—不开源
4Claude Opus 4.680.840.000.00—不开源
5Gemini 3.1 Pro Preview80.602887.000.00—不开源
6Claude Sonnet 480.200.000.00—不开源
7MiniMax M2.580.200.000.002290BFree commercial
8GPT-5.280.000.000.00—不开源
9Claude Sonnet 4.679.600.000.00—不开源
10Claude Opus 4.179.400.000.00—不开源
11Qwen 3.6 Plus Preview78.800.000.00—不开源
12GLM-577.800.000.007440BFree commercial
13Claude Sonnet 4.577.200.000.00—不开源
14GPT-5.1-Codex-Max76.800.000.00—不开源
15Kimi K2.576.800.000.0010000BFree commercial
16Qwen3.5-397B-A17B76.400.000.00397BFree commercial
17GPT-5.176.300.000.00—不开源
18GPT-5.176.300.000.00—不开源
19Gemini 3.0 Pro (Preview 11-2025)76.2092.000.00—不开源
20Qwen3-Max-Thinking75.3085.900.0010000B不开源
21o3-pro75.000.000.00—不开源
22M2.174.800.000.002300BFree commercial
23Claude Opus 4.174.500.000.00—不开源
24Claude Opus 4.174.5065.000.00—不开源
25GPT-5 Codex74.500.000.00—不开源
26Step 3.5 Flash74.4086.400.001960BFree commercial
27GLM-4.773.800.000.003580BFree commercial
28Grok 4 Heavy73.500.000.00—不开源
29Haiku 4.573.300.000.00—不开源
30DeepSeek V3.273.100.000.006710BFree commercial
1
Claude Sonnet 4.5
SWE-bench Verified82.00
LiveCodeBench0.00
HumanEval0.00
不开源
2
Claude Sonnet 5
SWE-bench Verified82.00
LiveCodeBench0.00
HumanEval0.00
不开源
3
Claude Opus 4.5
SWE-bench Verified80.90
LiveCodeBench0.00
HumanEval0.00
不开源
4
Claude Opus 4.6
SWE-bench Verified80.84
LiveCodeBench0.00
HumanEval0.00
不开源
5
Gemini 3.1 Pro Preview
SWE-bench Verified80.60
LiveCodeBench2887.00
HumanEval0.00
不开源
6
Claude Sonnet 4
SWE-bench Verified80.20
LiveCodeBench0.00
HumanEval0.00
不开源
7
MiniMax M2.5
2290B
SWE-bench Verified80.20
LiveCodeBench0.00
HumanEval0.00
Free commercial
8
GPT-5.2
SWE-bench Verified80.00
LiveCodeBench0.00
HumanEval0.00
不开源
9
Claude Sonnet 4.6
SWE-bench Verified79.60
LiveCodeBench0.00
HumanEval0.00
不开源
10
Claude Opus 4.1
SWE-bench Verified79.40
LiveCodeBench0.00
HumanEval0.00
不开源
11
Qwen 3.6 Plus Preview
SWE-bench Verified78.80
LiveCodeBench0.00
HumanEval0.00
不开源
12
GLM-5
7440B
SWE-bench Verified77.80
LiveCodeBench0.00
HumanEval0.00
Free commercial
13
Claude Sonnet 4.5
SWE-bench Verified77.20
LiveCodeBench0.00
HumanEval0.00
不开源
14
GPT-5.1-Codex-Max
SWE-bench Verified76.80
LiveCodeBench0.00
HumanEval0.00
不开源
15
Kimi K2.5
10000B
SWE-bench Verified76.80
LiveCodeBench0.00
HumanEval0.00
Free commercial
16
Qwen3.5-397B-A17B
397B
SWE-bench Verified76.40
LiveCodeBench0.00
HumanEval0.00
Free commercial
17
GPT-5.1
SWE-bench Verified76.30
LiveCodeBench0.00
HumanEval0.00
不开源
18
GPT-5.1
SWE-bench Verified76.30
LiveCodeBench0.00
HumanEval0.00
不开源
19
Gemini 3.0 Pro (Preview 11-2025)
SWE-bench Verified76.20
LiveCodeBench92.00
HumanEval0.00
不开源
20
Qwen3-Max-Thinking
10000B
SWE-bench Verified75.30
LiveCodeBench85.90
HumanEval0.00
不开源
21
o3-pro
SWE-bench Verified75.00
LiveCodeBench0.00
HumanEval0.00
不开源
22
M2.1
2300B
SWE-bench Verified74.80
LiveCodeBench0.00
HumanEval0.00
Free commercial
23
Claude Opus 4.1
SWE-bench Verified74.50
LiveCodeBench0.00
HumanEval0.00
不开源
24
Claude Opus 4.1
SWE-bench Verified74.50
LiveCodeBench65.00
HumanEval0.00
不开源
25
GPT-5 Codex
SWE-bench Verified74.50
LiveCodeBench0.00
HumanEval0.00
不开源
26
Step 3.5 Flash
1960B
SWE-bench Verified74.40
LiveCodeBench86.40
HumanEval0.00
Free commercial
27
GLM-4.7
3580B
SWE-bench Verified73.80
LiveCodeBench0.00
HumanEval0.00
Free commercial
28
Grok 4 Heavy
SWE-bench Verified73.50
LiveCodeBench0.00
HumanEval0.00
不开源
29
Haiku 4.5
SWE-bench Verified73.30
LiveCodeBench0.00
HumanEval0.00
不开源
30
DeepSeek V3.2
6710B
SWE-bench Verified73.10
LiveCodeBench0.00
HumanEval0.00
Free commercial
Showing top 30 of 191 models