DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

大模型代码编程能力评测排行榜

本页面提供大模型代码编程能力评测排行榜,涵盖 SWE-Bench、LiveCodeBench、HumanEval 等数据集,对 GPT、Claude、Qwen、DeepSeek 等模型进行对比。

Updated on: 2025/10/12 20:54:51
SWE-bench VerifiedLiveCodeBenchHumanEval
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models

LLM Performance Results

Data source: DataLearnerAI
RankModelSWE-bench VerifiedLiveCodeBenchHumanEvalParams (B)License
1MiniMax M2.580.200.000.002290BFree commercial
2GLM-577.800.000.007440BFree commercial
3Kimi K2.576.8085.000.0010000BFree commercial
4Qwen3-Max-Thinking75.3085.900.0010000B不开源
5o3-pro75.000.000.00—不开源
6M2.174.800.000.002300BFree commercial
7Step 3.5 Flash74.4086.400.001960BFree commercial
8GLM-4.773.8084.900.003580BFree commercial
9DeepSeek V3.273.1083.300.006710BFree commercial
10Claude Opus 472.5056.600.00—不开源
11Kimi K2 Thinking71.3083.100.0010400BFree commercial
12Claude Sonnet 3.770.300.000.00—不开源
13MiniMax M269.4083.000.002300BFree commercial
14Kimi K2 090569.200.000.0010000BFree commercial
15DeepSeek-V3.1 Terminus68.4080.000.006710BFree commercial
16OpenAI o4 - mini68.100.000.00—不开源
17GLM-4.668.0084.500.003550BFree commercial
18DeepSeek V3.2-Exp67.8074.100.006710BFree commercial
19Qwen3-Coder-480B-A35B67.000.000.004800BFree commercial
20DeepSeek-V3.166.0074.800.006710BFree commercial
21GLM-4.564.2072.900.003550BFree commercial
22Gemini-2.5-Pro-Preview-05-0663.2077.100.00—不开源
23DeepSeek-R1-052857.6073.300.006710BFree commercial
24GLM-4.5-Air57.6070.700.001060BFree commercial
25MiniMax-M1-80k56.0065.000.004560BFree commercial
26MiniMax-M1-40k55.6062.300.004560BFree commercial
27GPT-4.154.6040.500.00—不开源
28Kimi K251.8053.700.0010000BFree commercial
29Gemini 2.5 Flash50.0055.400.00—不开源
30OpenAI o3-mini (high)49.3069.5097.60—不开源
1
MiniMax M2.5
2290B
SWE-bench Verified80.20
LiveCodeBench0.00
HumanEval0.00
Free commercial
2
GLM-5
7440B
SWE-bench Verified77.80
LiveCodeBench0.00
HumanEval0.00
Free commercial
3
Kimi K2.5
10000B
SWE-bench Verified76.80
LiveCodeBench85.00
HumanEval0.00
Free commercial
4
Qwen3-Max-Thinking
10000B
SWE-bench Verified75.30
LiveCodeBench85.90
HumanEval0.00
不开源
5
o3-pro
SWE-bench Verified75.00
LiveCodeBench0.00
HumanEval0.00
不开源
6
M2.1
2300B
SWE-bench Verified74.80
LiveCodeBench0.00
HumanEval0.00
Free commercial
7
Step 3.5 Flash
1960B
SWE-bench Verified74.40
LiveCodeBench86.40
HumanEval0.00
Free commercial
8
GLM-4.7
3580B
SWE-bench Verified73.80
LiveCodeBench84.90
HumanEval0.00
Free commercial
9
DeepSeek V3.2
6710B
SWE-bench Verified73.10
LiveCodeBench83.30
HumanEval0.00
Free commercial
10
Claude Opus 4
SWE-bench Verified72.50
LiveCodeBench56.60
HumanEval0.00
不开源
11
Kimi K2 Thinking
10400B
SWE-bench Verified71.30
LiveCodeBench83.10
HumanEval0.00
Free commercial
12
Claude Sonnet 3.7
SWE-bench Verified70.30
LiveCodeBench0.00
HumanEval0.00
不开源
13
MiniMax M2
2300B
SWE-bench Verified69.40
LiveCodeBench83.00
HumanEval0.00
Free commercial
14
Kimi K2 0905
10000B
SWE-bench Verified69.20
LiveCodeBench0.00
HumanEval0.00
Free commercial
15
DeepSeek-V3.1 Terminus
6710B
SWE-bench Verified68.40
LiveCodeBench80.00
HumanEval0.00
Free commercial
16
OpenAI o4 - mini
SWE-bench Verified68.10
LiveCodeBench0.00
HumanEval0.00
不开源
17
GLM-4.6
3550B
SWE-bench Verified68.00
LiveCodeBench84.50
HumanEval0.00
Free commercial
18
DeepSeek V3.2-Exp
6710B
SWE-bench Verified67.80
LiveCodeBench74.10
HumanEval0.00
Free commercial
19
Qwen3-Coder-480B-A35B
4800B
SWE-bench Verified67.00
LiveCodeBench0.00
HumanEval0.00
Free commercial
20
DeepSeek-V3.1
6710B
SWE-bench Verified66.00
LiveCodeBench74.80
HumanEval0.00
Free commercial
21
GLM-4.5
3550B
SWE-bench Verified64.20
LiveCodeBench72.90
HumanEval0.00
Free commercial
22
Gemini-2.5-Pro-Preview-05-06
SWE-bench Verified63.20
LiveCodeBench77.10
HumanEval0.00
不开源
23
DeepSeek-R1-0528
6710B
SWE-bench Verified57.60
LiveCodeBench73.30
HumanEval0.00
Free commercial
24
GLM-4.5-Air
1060B
SWE-bench Verified57.60
LiveCodeBench70.70
HumanEval0.00
Free commercial
25
MiniMax-M1-80k
4560B
SWE-bench Verified56.00
LiveCodeBench65.00
HumanEval0.00
Free commercial
26
MiniMax-M1-40k
4560B
SWE-bench Verified55.60
LiveCodeBench62.30
HumanEval0.00
Free commercial
27
GPT-4.1
SWE-bench Verified54.60
LiveCodeBench40.50
HumanEval0.00
不开源
28
Kimi K2
10000B
SWE-bench Verified51.80
LiveCodeBench53.70
HumanEval0.00
Free commercial
29
Gemini 2.5 Flash
SWE-bench Verified50.00
LiveCodeBench55.40
HumanEval0.00
不开源
30
OpenAI o3-mini (high)
SWE-bench Verified49.30
LiveCodeBench69.50
HumanEval97.60
不开源
Showing top 30 of 67 models