返回总榜单

大模型代码编程能力评测排行榜

本页面提供大模型代码编程能力评测排行榜，涵盖 SWE-Bench、LiveCodeBench、HumanEval 等数据集，对 GPT、Claude、Qwen、DeepSeek 等模型进行对比。

数据更新于: 2025/10/12 20:54:51

SWE-bench Verified LiveCodeBench HumanEval

参数规模:全部 3B及以下 7B 13B 34B 65B 100B及以上

模型类型:全部推理大模型基座大模型指令优化/聊天优化大模型编程大模型

大模型性能评测结果

数据来源：DataLearnerAI

排名	模型	SWE-bench Verified	LiveCodeBench	HumanEval	参数(亿)	开源情况
1	Claude Opus 4.5	80.90	87.00	0.00	—	不开源
2	Claude Opus 4.6

1

Claude Opus 4.5

SWE-bench Verified80.90

LiveCodeBench87.00

HumanEval0.00

不开源

2

Claude Opus 4.6

SWE-bench Verified80.84

LiveCodeBench76.00

HumanEval95.00

不开源

3

Claude Sonnet 4

SWE-bench Verified80.20

LiveCodeBench66.00

HumanEval0.00

不开源

4

2290B

SWE-bench Verified80.20

LiveCodeBench0.00

HumanEval0.00

免费商用

5

Claude Opus 4.1

SWE-bench Verified79.40

LiveCodeBench65.00

HumanEval0.00

不开源

6

SWE-bench Verified76.30

LiveCodeBench0.00

HumanEval0.00

不开源

7

Qwen3-Max-Thinking

10000B

SWE-bench Verified75.30

LiveCodeBench85.90

HumanEval0.00

不开源

8

SWE-bench Verified75.00

LiveCodeBench0.00

HumanEval0.00

不开源

9

6710B

SWE-bench Verified73.10

LiveCodeBench83.30

HumanEval0.00

免费商用

10

SWE-bench Verified72.50

LiveCodeBench56.60

HumanEval0.00

不开源

11

270B

SWE-bench Verified72.40

LiveCodeBench80.70

HumanEval0.00

免费商用

12

Kimi K2 Thinking

10400B

SWE-bench Verified71.30

LiveCodeBench83.10

HumanEval0.00

免费商用

13

SWE-bench Verified69.10

LiveCodeBench75.80

HumanEval0.00

不开源

14

OpenAI o4 - mini

SWE-bench Verified68.10

LiveCodeBench0.00

HumanEval0.00

不开源

15

DeepSeek V3.2-Exp

6710B

SWE-bench Verified67.80

LiveCodeBench74.10

HumanEval0.00

免费商用

16

SWE-bench Verified67.20

LiveCodeBench77.10

HumanEval0.00

不开源

17

3550B

SWE-bench Verified64.20

LiveCodeBench72.90

HumanEval0.00

免费商用

18

Gemini 2.5 Pro Experimental 03-25

SWE-bench Verified63.80

LiveCodeBench70.40

HumanEval0.00

不开源

19

Gemini-2.5-Pro-Preview-05-06

SWE-bench Verified63.20

LiveCodeBench77.10

HumanEval0.00

不开源

20

117B

SWE-bench Verified60.10

LiveCodeBench0.00

HumanEval0.00

免费商用

21

310B

SWE-bench Verified59.20

LiveCodeBench0.00

HumanEval0.00

免费商用

22

SWE-bench Verified58.60

LiveCodeBench82.00

HumanEval0.00

不开源

23

DeepSeek-R1-0528

6710B

SWE-bench Verified57.60

LiveCodeBench73.30

HumanEval0.00

免费商用

24

1060B

SWE-bench Verified57.60

LiveCodeBench70.70

HumanEval0.00

免费商用

25

4560B

SWE-bench Verified56.00

LiveCodeBench65.00

HumanEval0.00

免费商用

26

4560B

SWE-bench Verified55.60

LiveCodeBench62.30

HumanEval0.00

免费商用

27

SWE-bench Verified54.60

LiveCodeBench0.00

HumanEval0.00

不开源

28

Gemini 2.5 Flash

SWE-bench Verified50.00

LiveCodeBench55.40

HumanEval0.00

不开源

29

OpenAI o3-mini (high)

SWE-bench Verified49.30

LiveCodeBench69.50

HumanEval97.60

不开源

30

6710B

SWE-bench Verified49.20

LiveCodeBench65.90

HumanEval0.00

免费商用