加载中...

大模型代码编程能力评测排行榜

本页面提供大模型代码编程能力评测排行榜，涵盖 SWE-Bench、LiveCodeBench、HumanEval 等数据集，对 GPT、Claude、Qwen、DeepSeek 等模型进行对比。

数据更新于: 2025/10/12 20:54:51

评测切换

在这里切换评测，图表和表格会同步更新

SWE-bench Verified LiveCodeBench HumanEval

还有更多评测基准

进入评测基准列表，按类别/语言快速筛选

更多评测

筛选

参数规模

全部 3B及以下 7B 13B 34B 65B 100B及以上

模型类型

全部推理大模型基座大模型指令优化/聊天优化大模型编程大模型

大模型性能评测结果

数据来源：DataLearnerAI

排名	模型	SWE-bench Verified	LiveCodeBench	HumanEval	参数(亿)	开源情况
1	Claude Sonnet 5	82.00	0.00	0.00	—	不开源
2	Claude Sonnet 4.5	82.00	0.00	0.00	—	不开源
3	Claude Opus 4.5	80.90	0.00	0.00	—	不开源
4	Claude Opus 4.6	80.80	0.00	0.00	—	不开源
5	Gemini 3.1 Pro Preview	80.60	0.00	0.00	—	不开源
6	Claude Sonnet 4	80.20	0.00	0.00	—	不开源
7	MiniMax M2.5	80.20	0.00	0.00	2290B	免费商用
8	GPT-5.2	80.00	0.00	0.00	—	不开源
9	Claude Sonnet 4.6	79.60	0.00	0.00	—	不开源
10	Claude Opus 4.1	79.40	0.00	0.00	—	不开源
11	GLM-5	77.80	0.00	0.00	7440B	免费商用
12	Claude Sonnet 4.5	77.20	0.00	0.00	—	不开源
13	GPT-5.1-Codex-Max	76.80	0.00	0.00	—	不开源
14	Kimi K2.5	76.80	85.00	0.00	10000B	免费商用
15	Qwen3.5-397B-A17B	76.40	0.00	0.00	397B	免费商用
16	GPT-5.1	76.30	0.00	0.00	—	不开源
17	Gemini 3.0 Pro (Preview 11-2025)	76.20	92.00	0.00	—	不开源
18	Qwen3-Max-Thinking	75.30	85.90	0.00	10000B	不开源
19	o3-pro	75.00	0.00	0.00	—	不开源
20	M2.1	74.80	0.00	0.00	2300B	免费商用
21	Claude Opus 4.1	74.50	0.00	0.00	—	不开源
22	Claude Opus 4.1	74.50	65.00	0.00	—	不开源
23	GPT-5 Codex	74.50	0.00	0.00	—	不开源
24	Step 3.5 Flash	74.40	86.40	0.00	1960B	免费商用
25	GLM-4.7	73.80	0.00	0.00	3580B	免费商用
26	Grok 4 Heavy	73.50	0.00	0.00	—	不开源
27	Haiku 4.5	73.30	0.00	0.00	—	不开源
28	DeepSeek V3.2	73.10	0.00	0.00	6710B	免费商用
29	GPT-5	72.80	0.00	0.00	—	不开源
30	Claude Sonnet 4	72.70	0.00	0.00	—	不开源
31	Claude Opus 4	72.50	56.60	0.00	—	不开源
32	Grok 4 Code	72.00	0.00	0.00	—	不开源
33	Kimi K2 Thinking	71.30	0.00	0.00	10400B	免费商用
34	Grok Code Fast 1	70.80	0.00	0.00	—	不开源
35	Qwen3-Coder-Next	70.60	0.00	0.00	80B	免费商用
36	GPT-5.1 Codex	70.40	85.50	0.00	—	不开源
37	Claude Sonnet 3.7	70.30	0.00	0.00	—	不开源
38	DeepSeek V3.2	70.20	83.30	0.00	6710B	免费商用
39	Qwen3 Max (Preview)	69.60	57.50	0.00	—	不开源
40	MiniMax M2	69.40	0.00	0.00	2300B	免费商用
41	Kimi K2 0905	69.20	0.00	0.00	10000B	免费商用
42	Kimi K2 0905	69.20	0.00	0.00	10000B	免费商用
43	OpenAI o3	69.10	0.00	0.00	—	不开源
44	Gemini 3.0 Flash	68.70	0.00	0.00	—	不开源
45	DeepSeek-V3.1 Terminus	68.40	74.90	0.00	6710B	免费商用
46	OpenAI o4 - mini	68.10	0.00	0.00	—	不开源
47	GLM-4.6	68.00	56.00	0.00	3550B	免费商用
48	GLM-4.6	68.00	84.50	0.00	3550B	免费商用
49	DeepSeek V3.2-Exp	67.80	0.00	0.00	6710B	免费商用
50	Gemini 2.5-Pro	67.20	0.00	0.00	—	不开源
51	Qwen3-Coder-480B-A35B	67.00	0.00	0.00	4800B	免费商用
52	DeepSeek-V3.1	66.00	56.40	0.00	6710B	免费商用
53	GLM-4.5	64.20	72.90	0.00	3550B	免费商用
54	Gemini 2.5 Pro Experimental 03-25	63.80	70.40	0.00	—	不开源
55	Gemini-2.5-Pro-Preview-05-06	63.20	77.10	0.00	—	不开源
56	Claude Sonnet 3.7	62.30	0.00	0.00	—	不开源
57	Devstral Medium	61.60	0.00	0.00	—	不开源
58	Haiku 4.5	60.60	51.00	0.00	—	不开源
59	GPT OSS 120B	60.10	0.00	0.00	117B	免费商用
60	GLM-4.7-Flash	59.20	0.00	0.00	310B	免费商用
61	Grok 4	58.60	82.00	0.00	—	不开源
62	DeepSeek-R1-0528	57.60	73.30	0.00	6710B	免费商用
63	GLM-4.5-Air	57.60	70.70	0.00	1060B	免费商用
64	MiniMax-M1-80k	56.00	65.00	0.00	4560B	免费商用
65	MiniMax-M1-40k	55.60	62.30	0.00	4560B	免费商用
66	GPT-4.1	54.60	40.50	0.00	—	不开源
67	Gemini 2.5 Flash-Preview-09-2025	54.00	0.00	0.00	—	不开源
68	Devstral Small 1.1	53.60	0.00	0.00	240B	免费商用
69	Kimi K2	51.80	53.70	0.00	10000B	免费商用
70	Qwen3-Coder-Flash	51.60	0.00	0.00	305B	免费商用
71	Gemini 2.5 Flash	50.00	41.10	0.00	—	不开源
72	OpenAI o3-mini (high)	49.30	69.50	97.60	—	不开源
73	DeepSeek-R1	49.20	65.90	0.00	6710B	免费商用
74	Claude 3.5 Sonnet New	49.00	38.70	93.70	—	不开源
75	OpenAI o1	48.90	71.00	0.00	—	不开源
76	Gemini 2.5 Flash	48.90	55.40	0.00	—	不开源
77	Devstral Small 1.0	46.80	0.00	0.00	240B	免费商用
78	OpenAI o3-mini	40.80	0.00	0.00	—	不开源
79	DeepSeek-V3-0324	38.80	49.20	0.00	6710B	免费商用
80	GPT-4.5	38.00	46.40	0.00	—	不开源
81	Qwen3-235B-A22B	34.40	70.70	0.00	2350B	免费商用
82	GPT OSS 20B	34.00	0.00	0.00	210B	免费商用
83	GPT-4o	31.00	35.10	90.00	—	不开源
84	Gemini 2.5 Flash-Lite	27.60	34.30	0.00	—	不开源
85	GPT-4.1 mini	23.60	0.00	0.00	—	不开源
86	Qwen3-30B-A3B-2507	22.00	0.00	0.00	305B	免费商用
87	Gemini 2.0 Flash Experimental	21.40	29.10	0.00	—	不开源
88	Kimi-k1.6-IOI-high	0.00	73.80	0.00	—	不开源
89	MiniMax M2	0.00	83.00	0.00	2300B	免费商用
90	QwQ-Max-Preview	0.00	65.60	0.00	—	免费商用
91	Qwen3-32B	0.00	65.70	0.00	320B	免费商用
92	Kimi-k1.6-IOI	0.00	65.90	0.00	—	不开源
93	Claude Sonnet 4	0.00	66.00	0.00	—	不开源
94	DeepSeek-V3.1	0.00	74.80	0.00	6710B	免费商用
95	Qwen3-235B-A22B-Thinking-2507	0.00	74.10	0.00	2350B	免费商用
96	Qwen3-235B-A22B-Thinking	0.00	74.10	0.00	305B	免费商用
97	DeepSeek V3.2-Exp	0.00	74.10	0.00	6710B	免费商用
98	Pangu Embedded	0.00	67.10	0.00	70B	免费商用
99	Claude Sonnet 4.5	0.00	71.00	0.00	—	不开源
100	Qwen3-235B-A22B	0.00	70.70	0.00	2350B	免费商用
101	Grok 3	0.00	70.60	0.00	—	不开源
102	OpenAI o3-mini (medium)	0.00	67.40	0.00	—	不开源
103	OpenAI o3	0.00	75.80	0.00	—	不开源
104	Step3	0.00	67.10	0.00	3210B	免费商用
105	GLM-4-9B-Chat	0.00	51.80	0.00	90B	免费商用
106	Gemma 3 - 12B (IT)	0.00	24.60	0.00	120B	免费商用
107	Gemini 2.0 Flash-Lite	0.00	28.90	0.00	—	不开源
108	Qwen3-30B-A3B	0.00	29.00	0.00	305B	免费商用
109	Llama 4 Scout Instruct	0.00	32.80	0.00	1090B	免费商用
110	Qwen3-4B-2507	0.00	35.10	0.00	40B	免费商用
111	GPT-4o(2025-03-27)	0.00	35.80	0.00	—	不开源
112	ERNIE-4.5-300B-A47B	0.00	38.80	0.00	3000B	免费商用
113	ERNIE-4.5-VL-424B-A47B-Base	0.00	38.80	0.00	4240B	免费商用
114	Qwen3-30B-A3B-2507	0.00	43.20	0.00	305B	免费商用
115	Llama 4 Maverick Instruct	0.00	43.40	0.00	4000B	免费商用
116	Claude Sonnet 4	0.00	48.50	0.00	—	不开源
117	Llama 4 Behemoth Instruct	0.00	49.40	0.00	20000B	免费商用
118	Qwen3-235B-A22B-2507	0.00	51.80	0.00	2350B	免费商用
119	Hunyuan-T1	0.00	64.90	0.00	—	不开源
120	DeepSeek V3.2-Exp	0.00	55.00	0.00	6710B	免费商用
121	GPT-5-mini	0.00	55.00	0.00	—	不开源
122	Qwen3-4B-Thinking-2507	0.00	55.20	0.00	40B	免费商用
123	Magistral-Small-2506	0.00	55.84	0.00	240B	免费商用
124	Qwen3-Next	0.00	56.60	0.00	800B	免费商用
125	Hunyuan-7B	0.00	57.00	0.00	70B	免费商用
126	Qwen3-8B	0.00	57.50	0.00	80B	免费商用
127	Claude Sonnet 4.5	0.00	59.00	0.00	—	不开源
128	Magistral-Medium-2506	0.00	59.36	0.00	—	不开源
129	Pangu Pro MoE	0.00	59.60	0.00	719B	免费商用
130	Qwen3-8B	0.00	61.80	0.00	80B	免费商用
131	Haiku 4.5	0.00	62.00	0.00	—	不开源
132	Hunyuan-A13B-Instruct	0.00	63.90	0.00	800B	免费商用
133	Qwen2.5-Max	0.00	0.00	73.20	—	不开源
134	Llama3.3-70B-Instruct	0.00	33.30	88.40	700B	免费商用
135	Grok 2	0.00	0.00	88.40	2690B	免费商用
136	Claude 3.5 Haiku	0.00	0.00	88.10	—	不开源
137	Gemma 3 - 27B (IT)	0.00	29.70	87.80	270B	免费商用
138	GPT-4o mini	0.00	0.00	87.20	—	不开源
139	Codestral 25.01	0.00	37.90	86.60	—	不开源
140	Claude3-Opus	0.00	0.00	84.90	—	不开源
141	Codestral	0.00	31.50	81.10	220B	不可商用
142	Llama3.1-70B-Instruct	0.00	33.30	80.50	700B	免费商用
143	Phi-4-mini-instruct (3.8B)	0.00	0.00	74.40	38B	免费商用
144	Grok-1.5	0.00	0.00	74.10	—	不开源
145	Qwen2.5-32B	0.00	51.20	88.40	320B	免费商用
146	Llama3.1-8B-Instruct	0.00	0.00	66.50	80B	免费商用
147	C4AI Aya Vision 32B	0.00	0.00	62.20	320B	不可商用
148	Qwen2.5-72B	0.00	0.00	59.10	727B	免费商用
149	Qwen2.5-7B	0.00	0.00	57.90	70B	免费商用
150	Moonlight-16B-A3B-Instruct	0.00	0.00	48.10	160B	免费商用
151	Qwen2.5-3B	0.00	0.00	42.10	30B	免费商用
152	Gemma 2 - 9B	0.00	0.00	37.80	90B	免费商用
153	Llama3.1-8B	0.00	0.00	33.50	80B	免费商用
154	Mistral-7B-Instruct-v0.3	0.00	0.00	29.30	70B	免费商用
155	Llama-3.2-3B	0.00	0.00	28.00	32B	免费商用
156	Claude Opus 4.5	0.00	87.00	0.00	—	不开源
157	Grok-3 - Reasoning Beta	0.00	79.40	0.00	—	不开源
158	DeepSeek-V3.1 Terminus	0.00	80.00	0.00	6710B	免费商用
159	Grok 4 Fast	0.00	80.00	0.00	—	不开源
160	Gemini 2.5 Pro Deep Think	0.00	80.40	0.00	—	不开源
161	Grok 4.1 Fast	0.00	82.00	0.00	—	不开源
162	GLM-4.6	0.00	82.80	0.00	3550B	免费商用
163	QwQ-32B	0.00	0.00	19.00	325B	免费商用
164	Kimi K2 Thinking	0.00	83.10	0.00	10400B	免费商用
165	Qwen3.5-397B-A17B	0.00	83.60	0.00	397B	免费商用
166	GLM-4.7	0.00	84.90	0.00	3580B	免费商用
167	Gemini 2.5-Pro	0.00	77.10	0.00	—	不开源
168	Gemini 2.5 Deep Think	0.00	87.60	0.00	—	不开源
169	OpenAI o1-mini	0.00	52.00	92.40	—	不开源
170	Claude 3.5 Sonnet	0.00	0.00	92.00	—	不开源
171	Hunyuan-TurboS	0.00	32.00	91.00	—	不开源
172	GPT-4o(2024-11-20)	0.00	0.00	90.20	—	不开源
173	Gemini 1.5 Pro	0.00	0.00	89.00	—	不开源
174	Llama3.1-405B Instruct	0.00	30.20	89.00	4050B	免费商用
175	Amazon Nova Pro	0.00	0.00	89.00	—	不开源
176	DeepSeek-V3	0.00	34.60	89.00	6810B	免费商用
177	Mistral-Small-3.1-24B-Instruct-2503	0.00	0.00	88.41	240B	免费商用

Claude Sonnet 5

SWE-bench Verified82.00

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Sonnet 4.5

SWE-bench Verified82.00

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Opus 4.5

SWE-bench Verified80.90

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Opus 4.6

SWE-bench Verified80.80

LiveCodeBench0.00

HumanEval0.00

不开源

Gemini 3.1 Pro Preview

SWE-bench Verified80.60

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Sonnet 4

SWE-bench Verified80.20

LiveCodeBench0.00

HumanEval0.00

不开源

MiniMax M2.5

2290B

SWE-bench Verified80.20

LiveCodeBench0.00

HumanEval0.00

免费商用

GPT-5.2

SWE-bench Verified80.00

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Sonnet 4.6

SWE-bench Verified79.60

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Opus 4.1

SWE-bench Verified79.40

LiveCodeBench0.00

HumanEval0.00

不开源

GLM-5

7440B

SWE-bench Verified77.80

LiveCodeBench0.00

HumanEval0.00

免费商用

Claude Sonnet 4.5

SWE-bench Verified77.20

LiveCodeBench0.00

HumanEval0.00

不开源

GPT-5.1-Codex-Max

SWE-bench Verified76.80

LiveCodeBench0.00

HumanEval0.00

不开源

Kimi K2.5

10000B

SWE-bench Verified76.80

LiveCodeBench85.00

HumanEval0.00

免费商用

Qwen3.5-397B-A17B

397B

SWE-bench Verified76.40

LiveCodeBench0.00

HumanEval0.00

免费商用

GPT-5.1

SWE-bench Verified76.30

LiveCodeBench0.00

HumanEval0.00

不开源

Gemini 3.0 Pro (Preview 11-2025)

SWE-bench Verified76.20

LiveCodeBench92.00

HumanEval0.00

不开源

Qwen3-Max-Thinking

10000B

SWE-bench Verified75.30

LiveCodeBench85.90

HumanEval0.00

不开源

o3-pro

SWE-bench Verified75.00

LiveCodeBench0.00

HumanEval0.00

不开源

M2.1

2300B

SWE-bench Verified74.80

LiveCodeBench0.00

HumanEval0.00

免费商用

Claude Opus 4.1

SWE-bench Verified74.50

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Opus 4.1

SWE-bench Verified74.50

LiveCodeBench65.00

HumanEval0.00

不开源

GPT-5 Codex

SWE-bench Verified74.50

LiveCodeBench0.00

HumanEval0.00

不开源

Step 3.5 Flash

1960B

SWE-bench Verified74.40

LiveCodeBench86.40

HumanEval0.00

免费商用

GLM-4.7

3580B

SWE-bench Verified73.80

LiveCodeBench0.00

HumanEval0.00

免费商用

Grok 4 Heavy

SWE-bench Verified73.50

LiveCodeBench0.00

HumanEval0.00

不开源

Haiku 4.5

SWE-bench Verified73.30

LiveCodeBench0.00

HumanEval0.00

不开源

DeepSeek V3.2

6710B

SWE-bench Verified73.10

LiveCodeBench0.00

HumanEval0.00

免费商用

GPT-5

SWE-bench Verified72.80

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Sonnet 4

SWE-bench Verified72.70

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Opus 4

SWE-bench Verified72.50

LiveCodeBench56.60

HumanEval0.00

不开源

Grok 4 Code

SWE-bench Verified72.00

LiveCodeBench0.00

HumanEval0.00

不开源

Kimi K2 Thinking

10400B

SWE-bench Verified71.30

LiveCodeBench0.00

HumanEval0.00

免费商用

Grok Code Fast 1

SWE-bench Verified70.80

LiveCodeBench0.00

HumanEval0.00

不开源

Qwen3-Coder-Next

80B

SWE-bench Verified70.60

LiveCodeBench0.00

HumanEval0.00

免费商用

GPT-5.1 Codex

SWE-bench Verified70.40

LiveCodeBench85.50

HumanEval0.00

不开源

Claude Sonnet 3.7

SWE-bench Verified70.30

LiveCodeBench0.00

HumanEval0.00

不开源

DeepSeek V3.2

6710B

SWE-bench Verified70.20

LiveCodeBench83.30

HumanEval0.00

免费商用

Qwen3 Max (Preview)

SWE-bench Verified69.60

LiveCodeBench57.50

HumanEval0.00

不开源

MiniMax M2

2300B

SWE-bench Verified69.40

LiveCodeBench0.00

HumanEval0.00

免费商用

Kimi K2 0905

10000B

SWE-bench Verified69.20

LiveCodeBench0.00

HumanEval0.00

免费商用

Kimi K2 0905

10000B

SWE-bench Verified69.20

LiveCodeBench0.00

HumanEval0.00

免费商用

OpenAI o3

SWE-bench Verified69.10

LiveCodeBench0.00

HumanEval0.00

不开源

Gemini 3.0 Flash

SWE-bench Verified68.70

LiveCodeBench0.00

HumanEval0.00

不开源

DeepSeek-V3.1 Terminus

6710B

SWE-bench Verified68.40

LiveCodeBench74.90

HumanEval0.00

免费商用

OpenAI o4 - mini

SWE-bench Verified68.10

LiveCodeBench0.00

HumanEval0.00

不开源

GLM-4.6

3550B

SWE-bench Verified68.00

LiveCodeBench56.00

HumanEval0.00

免费商用

GLM-4.6

3550B

SWE-bench Verified68.00

LiveCodeBench84.50

HumanEval0.00

免费商用

DeepSeek V3.2-Exp

6710B

SWE-bench Verified67.80

LiveCodeBench0.00

HumanEval0.00

免费商用

Gemini 2.5-Pro

SWE-bench Verified67.20

LiveCodeBench0.00

HumanEval0.00

不开源

Qwen3-Coder-480B-A35B

4800B

SWE-bench Verified67.00

LiveCodeBench0.00

HumanEval0.00

免费商用

DeepSeek-V3.1

6710B

SWE-bench Verified66.00

LiveCodeBench56.40

HumanEval0.00

免费商用

GLM-4.5

3550B

SWE-bench Verified64.20

LiveCodeBench72.90

HumanEval0.00

免费商用

Gemini 2.5 Pro Experimental 03-25

SWE-bench Verified63.80

LiveCodeBench70.40

HumanEval0.00

不开源

Gemini-2.5-Pro-Preview-05-06

SWE-bench Verified63.20

LiveCodeBench77.10

HumanEval0.00

不开源

Claude Sonnet 3.7

SWE-bench Verified62.30

LiveCodeBench0.00

HumanEval0.00

不开源

Devstral Medium

SWE-bench Verified61.60

LiveCodeBench0.00

HumanEval0.00

不开源

Haiku 4.5

SWE-bench Verified60.60

LiveCodeBench51.00

HumanEval0.00

不开源

GPT OSS 120B

117B

SWE-bench Verified60.10

LiveCodeBench0.00

HumanEval0.00

免费商用

GLM-4.7-Flash

310B

SWE-bench Verified59.20

LiveCodeBench0.00

HumanEval0.00

免费商用

Grok 4

SWE-bench Verified58.60

LiveCodeBench82.00

HumanEval0.00

不开源

DeepSeek-R1-0528

6710B

SWE-bench Verified57.60

LiveCodeBench73.30

HumanEval0.00

免费商用

GLM-4.5-Air

1060B

SWE-bench Verified57.60

LiveCodeBench70.70

HumanEval0.00

免费商用

MiniMax-M1-80k

4560B

SWE-bench Verified56.00

LiveCodeBench65.00

HumanEval0.00

免费商用

MiniMax-M1-40k

4560B

SWE-bench Verified55.60

LiveCodeBench62.30

HumanEval0.00

免费商用

GPT-4.1

SWE-bench Verified54.60

LiveCodeBench40.50

HumanEval0.00

不开源

Gemini 2.5 Flash-Preview-09-2025

SWE-bench Verified54.00

LiveCodeBench0.00

HumanEval0.00

不开源

Devstral Small 1.1

240B

SWE-bench Verified53.60

LiveCodeBench0.00

HumanEval0.00

免费商用

Kimi K2

10000B

SWE-bench Verified51.80

LiveCodeBench53.70

HumanEval0.00

免费商用

Qwen3-Coder-Flash

305B

SWE-bench Verified51.60

LiveCodeBench0.00

HumanEval0.00

免费商用

Gemini 2.5 Flash

SWE-bench Verified50.00

LiveCodeBench41.10

HumanEval0.00

不开源

OpenAI o3-mini (high)

SWE-bench Verified49.30

LiveCodeBench69.50

HumanEval97.60

不开源

DeepSeek-R1

6710B

SWE-bench Verified49.20

LiveCodeBench65.90

HumanEval0.00

免费商用

Claude 3.5 Sonnet New

SWE-bench Verified49.00

LiveCodeBench38.70

HumanEval93.70

不开源

OpenAI o1

SWE-bench Verified48.90

LiveCodeBench71.00

HumanEval0.00

不开源

Gemini 2.5 Flash

SWE-bench Verified48.90

LiveCodeBench55.40

HumanEval0.00

不开源

Devstral Small 1.0

240B

SWE-bench Verified46.80

LiveCodeBench0.00

HumanEval0.00

免费商用

OpenAI o3-mini

SWE-bench Verified40.80

LiveCodeBench0.00

HumanEval0.00

不开源

DeepSeek-V3-0324

6710B

SWE-bench Verified38.80

LiveCodeBench49.20

HumanEval0.00

免费商用

GPT-4.5

SWE-bench Verified38.00

LiveCodeBench46.40

HumanEval0.00

不开源

Qwen3-235B-A22B

2350B

SWE-bench Verified34.40

LiveCodeBench70.70

HumanEval0.00

免费商用

GPT OSS 20B

210B

SWE-bench Verified34.00

LiveCodeBench0.00

HumanEval0.00

免费商用

GPT-4o

SWE-bench Verified31.00

LiveCodeBench35.10

HumanEval90.00

不开源

Gemini 2.5 Flash-Lite

SWE-bench Verified27.60

LiveCodeBench34.30

HumanEval0.00

不开源

GPT-4.1 mini

SWE-bench Verified23.60

LiveCodeBench0.00

HumanEval0.00

不开源

Qwen3-30B-A3B-2507

305B

SWE-bench Verified22.00

LiveCodeBench0.00

HumanEval0.00

免费商用

Gemini 2.0 Flash Experimental

SWE-bench Verified21.40

LiveCodeBench29.10

HumanEval0.00

不开源

Kimi-k1.6-IOI-high

SWE-bench Verified0.00

LiveCodeBench73.80

HumanEval0.00

不开源

MiniMax M2

2300B

SWE-bench Verified0.00

LiveCodeBench83.00

HumanEval0.00

免费商用

QwQ-Max-Preview

SWE-bench Verified0.00

LiveCodeBench65.60

HumanEval0.00

免费商用

Qwen3-32B

320B

SWE-bench Verified0.00

LiveCodeBench65.70

HumanEval0.00

免费商用

Kimi-k1.6-IOI

SWE-bench Verified0.00

LiveCodeBench65.90

HumanEval0.00

不开源

Claude Sonnet 4

SWE-bench Verified0.00

LiveCodeBench66.00

HumanEval0.00

不开源

DeepSeek-V3.1

6710B

SWE-bench Verified0.00

LiveCodeBench74.80

HumanEval0.00

免费商用

Qwen3-235B-A22B-Thinking-2507

2350B

SWE-bench Verified0.00

LiveCodeBench74.10

HumanEval0.00

免费商用

Qwen3-235B-A22B-Thinking

305B

SWE-bench Verified0.00

LiveCodeBench74.10

HumanEval0.00

免费商用

DeepSeek V3.2-Exp

6710B

SWE-bench Verified0.00

LiveCodeBench74.10

HumanEval0.00

免费商用

Pangu Embedded

70B

SWE-bench Verified0.00

LiveCodeBench67.10

HumanEval0.00

免费商用

Claude Sonnet 4.5

SWE-bench Verified0.00

LiveCodeBench71.00

HumanEval0.00

不开源

100

Qwen3-235B-A22B

2350B

SWE-bench Verified0.00

LiveCodeBench70.70

HumanEval0.00

免费商用

101

Grok 3

SWE-bench Verified0.00

LiveCodeBench70.60

HumanEval0.00

不开源

102

OpenAI o3-mini (medium)

SWE-bench Verified0.00

LiveCodeBench67.40

HumanEval0.00

不开源

103

OpenAI o3

SWE-bench Verified0.00

LiveCodeBench75.80

HumanEval0.00

不开源

104

Step3

3210B

SWE-bench Verified0.00

LiveCodeBench67.10

HumanEval0.00

免费商用

105

GLM-4-9B-Chat

90B

SWE-bench Verified0.00

LiveCodeBench51.80

HumanEval0.00

免费商用

106

Gemma 3 - 12B (IT)

120B

SWE-bench Verified0.00

LiveCodeBench24.60

HumanEval0.00

免费商用

107

Gemini 2.0 Flash-Lite

SWE-bench Verified0.00

LiveCodeBench28.90

HumanEval0.00

不开源

108

Qwen3-30B-A3B

305B

SWE-bench Verified0.00

LiveCodeBench29.00

HumanEval0.00

免费商用

109

Llama 4 Scout Instruct

1090B

SWE-bench Verified0.00

LiveCodeBench32.80

HumanEval0.00

免费商用

110

Qwen3-4B-2507

40B

SWE-bench Verified0.00

LiveCodeBench35.10

HumanEval0.00

免费商用

111

GPT-4o(2025-03-27)

SWE-bench Verified0.00

LiveCodeBench35.80

HumanEval0.00

不开源

112

ERNIE-4.5-300B-A47B

3000B

SWE-bench Verified0.00

LiveCodeBench38.80

HumanEval0.00

免费商用

113

ERNIE-4.5-VL-424B-A47B-Base

4240B

SWE-bench Verified0.00

LiveCodeBench38.80

HumanEval0.00

免费商用

114

Qwen3-30B-A3B-2507

305B

SWE-bench Verified0.00

LiveCodeBench43.20

HumanEval0.00

免费商用

115

Llama 4 Maverick Instruct

4000B

SWE-bench Verified0.00

LiveCodeBench43.40

HumanEval0.00

免费商用

116

Claude Sonnet 4

SWE-bench Verified0.00

LiveCodeBench48.50

HumanEval0.00

不开源

117

Llama 4 Behemoth Instruct

20000B

SWE-bench Verified0.00

LiveCodeBench49.40

HumanEval0.00

免费商用

118

Qwen3-235B-A22B-2507

2350B

SWE-bench Verified0.00

LiveCodeBench51.80

HumanEval0.00

免费商用

119

Hunyuan-T1

SWE-bench Verified0.00

LiveCodeBench64.90

HumanEval0.00

不开源

120

DeepSeek V3.2-Exp

6710B

SWE-bench Verified0.00

LiveCodeBench55.00

HumanEval0.00

免费商用

121

GPT-5-mini

SWE-bench Verified0.00

LiveCodeBench55.00

HumanEval0.00

不开源

122

Qwen3-4B-Thinking-2507

40B

SWE-bench Verified0.00

LiveCodeBench55.20

HumanEval0.00

免费商用

123

Magistral-Small-2506

240B

SWE-bench Verified0.00

LiveCodeBench55.84

HumanEval0.00

免费商用

124

Qwen3-Next

800B

SWE-bench Verified0.00

LiveCodeBench56.60

HumanEval0.00

免费商用

125

Hunyuan-7B

70B

SWE-bench Verified0.00

LiveCodeBench57.00

HumanEval0.00

免费商用

126

Qwen3-8B

80B

SWE-bench Verified0.00

LiveCodeBench57.50

HumanEval0.00

免费商用

127

Claude Sonnet 4.5

SWE-bench Verified0.00

LiveCodeBench59.00

HumanEval0.00

不开源

128

Magistral-Medium-2506

SWE-bench Verified0.00

LiveCodeBench59.36

HumanEval0.00

不开源

129

Pangu Pro MoE

719B

SWE-bench Verified0.00

LiveCodeBench59.60

HumanEval0.00

免费商用

130

Qwen3-8B

80B

SWE-bench Verified0.00

LiveCodeBench61.80

HumanEval0.00

免费商用

131

Haiku 4.5

SWE-bench Verified0.00

LiveCodeBench62.00

HumanEval0.00

不开源

132

Hunyuan-A13B-Instruct

800B

SWE-bench Verified0.00

LiveCodeBench63.90

HumanEval0.00

免费商用

133

Qwen2.5-Max

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval73.20

不开源

134

Llama3.3-70B-Instruct

700B

SWE-bench Verified0.00

LiveCodeBench33.30

HumanEval88.40

免费商用

135

Grok 2

2690B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval88.40

免费商用

136

Claude 3.5 Haiku

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval88.10

不开源

137

Gemma 3 - 27B (IT)

270B

SWE-bench Verified0.00

LiveCodeBench29.70

HumanEval87.80

免费商用

138

GPT-4o mini

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval87.20

不开源

139

Codestral 25.01

SWE-bench Verified0.00

LiveCodeBench37.90

HumanEval86.60

不开源

140

Claude3-Opus

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval84.90

不开源

141

Codestral

220B

SWE-bench Verified0.00

LiveCodeBench31.50

HumanEval81.10

不可商用

142

Llama3.1-70B-Instruct

700B

SWE-bench Verified0.00

LiveCodeBench33.30

HumanEval80.50

免费商用

143

Phi-4-mini-instruct (3.8B)

38B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval74.40

免费商用

144

Grok-1.5

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval74.10

不开源

145

Qwen2.5-32B

320B

SWE-bench Verified0.00

LiveCodeBench51.20

HumanEval88.40

免费商用

146

Llama3.1-8B-Instruct

80B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval66.50

免费商用

147

C4AI Aya Vision 32B

320B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval62.20

不可商用

148

Qwen2.5-72B

727B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval59.10

免费商用

149

Qwen2.5-7B

70B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval57.90

免费商用

150

Moonlight-16B-A3B-Instruct

160B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval48.10

免费商用

151

Qwen2.5-3B

30B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval42.10

免费商用

152

Gemma 2 - 9B

90B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval37.80

免费商用

153

Llama3.1-8B

80B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval33.50

免费商用

154

Mistral-7B-Instruct-v0.3

70B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval29.30

免费商用

155

Llama-3.2-3B

32B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval28.00

免费商用

156

Claude Opus 4.5

SWE-bench Verified0.00

LiveCodeBench87.00

HumanEval0.00

不开源

157

Grok-3 - Reasoning Beta

SWE-bench Verified0.00

LiveCodeBench79.40

HumanEval0.00

不开源

158

DeepSeek-V3.1 Terminus

6710B

SWE-bench Verified0.00

LiveCodeBench80.00

HumanEval0.00

免费商用

159

Grok 4 Fast

SWE-bench Verified0.00

LiveCodeBench80.00

HumanEval0.00

不开源

160

Gemini 2.5 Pro Deep Think

SWE-bench Verified0.00

LiveCodeBench80.40

HumanEval0.00

不开源

161

Grok 4.1 Fast

SWE-bench Verified0.00

LiveCodeBench82.00

HumanEval0.00

不开源

162

GLM-4.6

3550B

SWE-bench Verified0.00

LiveCodeBench82.80

HumanEval0.00

免费商用

163

QwQ-32B

325B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval19.00

免费商用

164

Kimi K2 Thinking

10400B

SWE-bench Verified0.00

LiveCodeBench83.10

HumanEval0.00

免费商用

165

Qwen3.5-397B-A17B

397B

SWE-bench Verified0.00

LiveCodeBench83.60

HumanEval0.00

免费商用

166

GLM-4.7

3580B

SWE-bench Verified0.00

LiveCodeBench84.90

HumanEval0.00

免费商用

167

Gemini 2.5-Pro

SWE-bench Verified0.00

LiveCodeBench77.10

HumanEval0.00

不开源

168

Gemini 2.5 Deep Think

SWE-bench Verified0.00

LiveCodeBench87.60

HumanEval0.00

不开源

169

OpenAI o1-mini

SWE-bench Verified0.00

LiveCodeBench52.00

HumanEval92.40

不开源

170

Claude 3.5 Sonnet

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval92.00

不开源

171

Hunyuan-TurboS

SWE-bench Verified0.00

LiveCodeBench32.00

HumanEval91.00

不开源

172

GPT-4o(2024-11-20)

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval90.20

不开源

173

Gemini 1.5 Pro

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval89.00

不开源

174

Llama3.1-405B Instruct

4050B

SWE-bench Verified0.00

LiveCodeBench30.20

HumanEval89.00

免费商用

175

Amazon Nova Pro

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval89.00

不开源

176

DeepSeek-V3

6810B

SWE-bench Verified0.00

LiveCodeBench34.60

HumanEval89.00

免费商用

177

Mistral-Small-3.1-24B-Instruct-2503

240B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval88.41

免费商用

加载中...

返回总榜单

大模型代码编程能力评测排行榜

本页面提供大模型代码编程能力评测排行榜，涵盖 SWE-Bench、LiveCodeBench、HumanEval 等数据集，对 GPT、Claude、Qwen、DeepSeek 等模型进行对比。

数据更新于: 2025/10/12 20:54:51

评测切换

在这里切换评测，图表和表格会同步更新

SWE-bench Verified LiveCodeBench HumanEval

还有更多评测基准

进入评测基准列表，按类别/语言快速筛选

更多评测

筛选

参数规模

全部 3B及以下 7B 13B 34B 65B 100B及以上

模型类型

全部推理大模型基座大模型指令优化/聊天优化大模型编程大模型

大模型性能评测结果

数据来源：DataLearnerAI

排名	模型	SWE-bench Verified	LiveCodeBench	HumanEval	参数(亿)	开源情况
1	Claude Sonnet 5	82.00	0.00	0.00	—	不开源
2	Claude Sonnet 4.5	82.00	0.00	0.00	—	不开源
3	Claude Opus 4.5	80.90	0.00	0.00	—	不开源
4	Claude Opus 4.6	80.80	0.00	0.00	—	不开源
5	Gemini 3.1 Pro Preview	80.60	0.00	0.00	—	不开源
6	Claude Sonnet 4	80.20	0.00	0.00	—	不开源
7	MiniMax M2.5	80.20	0.00	0.00	2290B	免费商用
8	GPT-5.2	80.00	0.00	0.00	—	不开源
9	Claude Sonnet 4.6	79.60	0.00	0.00	—	不开源
10	Claude Opus 4.1	79.40	0.00	0.00	—	不开源
11	GLM-5	77.80	0.00	0.00	7440B	免费商用
12	Claude Sonnet 4.5	77.20	0.00	0.00	—	不开源
13	GPT-5.1-Codex-Max	76.80	0.00	0.00	—	不开源
14	Kimi K2.5	76.80	85.00	0.00	10000B	免费商用
15	Qwen3.5-397B-A17B	76.40	0.00	0.00	397B	免费商用
16	GPT-5.1	76.30	0.00	0.00	—	不开源
17	Gemini 3.0 Pro (Preview 11-2025)	76.20	92.00	0.00	—	不开源
18	Qwen3-Max-Thinking	75.30	85.90	0.00	10000B	不开源
19	o3-pro	75.00	0.00	0.00	—	不开源
20	M2.1	74.80	0.00	0.00	2300B	免费商用
21	Claude Opus 4.1	74.50	0.00	0.00	—	不开源
22	Claude Opus 4.1	74.50	65.00	0.00	—	不开源
23	GPT-5 Codex	74.50	0.00	0.00	—	不开源
24	Step 3.5 Flash	74.40	86.40	0.00	1960B	免费商用
25	GLM-4.7	73.80	0.00	0.00	3580B	免费商用
26	Grok 4 Heavy	73.50	0.00	0.00	—	不开源
27	Haiku 4.5	73.30	0.00	0.00	—	不开源
28	DeepSeek V3.2	73.10	0.00	0.00	6710B	免费商用
29	GPT-5	72.80	0.00	0.00	—	不开源
30	Claude Sonnet 4	72.70	0.00	0.00	—	不开源
31	Claude Opus 4	72.50	56.60	0.00	—	不开源
32	Grok 4 Code	72.00	0.00	0.00	—	不开源
33	Kimi K2 Thinking	71.30	0.00	0.00	10400B	免费商用
34	Grok Code Fast 1	70.80	0.00	0.00	—	不开源
35	Qwen3-Coder-Next	70.60	0.00	0.00	80B	免费商用
36	GPT-5.1 Codex	70.40	85.50	0.00	—	不开源
37	Claude Sonnet 3.7	70.30	0.00	0.00	—	不开源
38	DeepSeek V3.2	70.20	83.30	0.00	6710B	免费商用
39	Qwen3 Max (Preview)	69.60	57.50	0.00	—	不开源
40	MiniMax M2	69.40	0.00	0.00	2300B	免费商用
41	Kimi K2 0905	69.20	0.00	0.00	10000B	免费商用
42	Kimi K2 0905	69.20	0.00	0.00	10000B	免费商用
43	OpenAI o3	69.10	0.00	0.00	—	不开源
44	Gemini 3.0 Flash	68.70	0.00	0.00	—	不开源
45	DeepSeek-V3.1 Terminus	68.40	74.90	0.00	6710B	免费商用
46	OpenAI o4 - mini	68.10	0.00	0.00	—	不开源
47	GLM-4.6	68.00	56.00	0.00	3550B	免费商用
48	GLM-4.6	68.00	84.50	0.00	3550B	免费商用
49	DeepSeek V3.2-Exp	67.80	0.00	0.00	6710B	免费商用
50	Gemini 2.5-Pro	67.20	0.00	0.00	—	不开源
51	Qwen3-Coder-480B-A35B	67.00	0.00	0.00	4800B	免费商用
52	DeepSeek-V3.1	66.00	56.40	0.00	6710B	免费商用
53	GLM-4.5	64.20	72.90	0.00	3550B	免费商用
54	Gemini 2.5 Pro Experimental 03-25	63.80	70.40	0.00	—	不开源
55	Gemini-2.5-Pro-Preview-05-06	63.20	77.10	0.00	—	不开源
56	Claude Sonnet 3.7	62.30	0.00	0.00	—	不开源
57	Devstral Medium	61.60	0.00	0.00	—	不开源
58	Haiku 4.5	60.60	51.00	0.00	—	不开源
59	GPT OSS 120B	60.10	0.00	0.00	117B	免费商用
60	GLM-4.7-Flash	59.20	0.00	0.00	310B	免费商用
61	Grok 4	58.60	82.00	0.00	—	不开源
62	DeepSeek-R1-0528	57.60	73.30	0.00	6710B	免费商用
63	GLM-4.5-Air	57.60	70.70	0.00	1060B	免费商用
64	MiniMax-M1-80k	56.00	65.00	0.00	4560B	免费商用
65	MiniMax-M1-40k	55.60	62.30	0.00	4560B	免费商用
66	GPT-4.1	54.60	40.50	0.00	—	不开源
67	Gemini 2.5 Flash-Preview-09-2025	54.00	0.00	0.00	—	不开源
68	Devstral Small 1.1	53.60	0.00	0.00	240B	免费商用
69	Kimi K2	51.80	53.70	0.00	10000B	免费商用
70	Qwen3-Coder-Flash	51.60	0.00	0.00	305B	免费商用
71	Gemini 2.5 Flash	50.00	41.10	0.00	—	不开源
72	OpenAI o3-mini (high)	49.30	69.50	97.60	—	不开源
73	DeepSeek-R1	49.20	65.90	0.00	6710B	免费商用
74	Claude 3.5 Sonnet New	49.00	38.70	93.70	—	不开源
75	OpenAI o1	48.90	71.00	0.00	—	不开源
76	Gemini 2.5 Flash	48.90	55.40	0.00	—	不开源
77	Devstral Small 1.0	46.80	0.00	0.00	240B	免费商用
78	OpenAI o3-mini	40.80	0.00	0.00	—	不开源
79	DeepSeek-V3-0324	38.80	49.20	0.00	6710B	免费商用
80	GPT-4.5	38.00	46.40	0.00	—	不开源
81	Qwen3-235B-A22B	34.40	70.70	0.00	2350B	免费商用
82	GPT OSS 20B	34.00	0.00	0.00	210B	免费商用
83	GPT-4o	31.00	35.10	90.00	—	不开源
84	Gemini 2.5 Flash-Lite	27.60	34.30	0.00	—	不开源
85	GPT-4.1 mini	23.60	0.00	0.00	—	不开源
86	Qwen3-30B-A3B-2507	22.00	0.00	0.00	305B	免费商用
87	Gemini 2.0 Flash Experimental	21.40	29.10	0.00	—	不开源
88	Kimi-k1.6-IOI-high	0.00	73.80	0.00	—	不开源
89	MiniMax M2	0.00	83.00	0.00	2300B	免费商用
90	QwQ-Max-Preview	0.00	65.60	0.00	—	免费商用
91	Qwen3-32B	0.00	65.70	0.00	320B	免费商用
92	Kimi-k1.6-IOI	0.00	65.90	0.00	—	不开源
93	Claude Sonnet 4	0.00	66.00	0.00	—	不开源
94	DeepSeek-V3.1	0.00	74.80	0.00	6710B	免费商用
95	Qwen3-235B-A22B-Thinking-2507	0.00	74.10	0.00	2350B	免费商用
96	Qwen3-235B-A22B-Thinking	0.00	74.10	0.00	305B	免费商用
97	DeepSeek V3.2-Exp	0.00	74.10	0.00	6710B	免费商用
98	Pangu Embedded	0.00	67.10	0.00	70B	免费商用
99	Claude Sonnet 4.5	0.00	71.00	0.00	—	不开源
100	Qwen3-235B-A22B	0.00	70.70	0.00	2350B	免费商用
101	Grok 3	0.00	70.60	0.00	—	不开源
102	OpenAI o3-mini (medium)	0.00	67.40	0.00	—	不开源
103	OpenAI o3	0.00	75.80	0.00	—	不开源
104	Step3	0.00	67.10	0.00	3210B	免费商用
105	GLM-4-9B-Chat	0.00	51.80	0.00	90B	免费商用
106	Gemma 3 - 12B (IT)	0.00	24.60	0.00	120B	免费商用
107	Gemini 2.0 Flash-Lite	0.00	28.90	0.00	—	不开源
108	Qwen3-30B-A3B	0.00	29.00	0.00	305B	免费商用
109	Llama 4 Scout Instruct	0.00	32.80	0.00	1090B	免费商用
110	Qwen3-4B-2507	0.00	35.10	0.00	40B	免费商用
111	GPT-4o(2025-03-27)	0.00	35.80	0.00	—	不开源
112	ERNIE-4.5-300B-A47B	0.00	38.80	0.00	3000B	免费商用
113	ERNIE-4.5-VL-424B-A47B-Base	0.00	38.80	0.00	4240B	免费商用
114	Qwen3-30B-A3B-2507	0.00	43.20	0.00	305B	免费商用
115	Llama 4 Maverick Instruct	0.00	43.40	0.00	4000B	免费商用
116	Claude Sonnet 4	0.00	48.50	0.00	—	不开源
117	Llama 4 Behemoth Instruct	0.00	49.40	0.00	20000B	免费商用
118	Qwen3-235B-A22B-2507	0.00	51.80	0.00	2350B	免费商用
119	Hunyuan-T1	0.00	64.90	0.00	—	不开源
120	DeepSeek V3.2-Exp	0.00	55.00	0.00	6710B	免费商用
121	GPT-5-mini	0.00	55.00	0.00	—	不开源
122	Qwen3-4B-Thinking-2507	0.00	55.20	0.00	40B	免费商用
123	Magistral-Small-2506	0.00	55.84	0.00	240B	免费商用
124	Qwen3-Next	0.00	56.60	0.00	800B	免费商用
125	Hunyuan-7B	0.00	57.00	0.00	70B	免费商用
126	Qwen3-8B	0.00	57.50	0.00	80B	免费商用
127	Claude Sonnet 4.5	0.00	59.00	0.00	—	不开源
128	Magistral-Medium-2506	0.00	59.36	0.00	—	不开源
129	Pangu Pro MoE	0.00	59.60	0.00	719B	免费商用
130	Qwen3-8B	0.00	61.80	0.00	80B	免费商用
131	Haiku 4.5	0.00	62.00	0.00	—	不开源
132	Hunyuan-A13B-Instruct	0.00	63.90	0.00	800B	免费商用
133	Qwen2.5-Max	0.00	0.00	73.20	—	不开源
134	Llama3.3-70B-Instruct	0.00	33.30	88.40	700B	免费商用
135	Grok 2	0.00	0.00	88.40	2690B	免费商用
136	Claude 3.5 Haiku	0.00	0.00	88.10	—	不开源
137	Gemma 3 - 27B (IT)	0.00	29.70	87.80	270B	免费商用
138	GPT-4o mini	0.00	0.00	87.20	—	不开源
139	Codestral 25.01	0.00	37.90	86.60	—	不开源
140	Claude3-Opus	0.00	0.00	84.90	—	不开源
141	Codestral	0.00	31.50	81.10	220B	不可商用
142	Llama3.1-70B-Instruct	0.00	33.30	80.50	700B	免费商用
143	Phi-4-mini-instruct (3.8B)	0.00	0.00	74.40	38B	免费商用
144	Grok-1.5	0.00	0.00	74.10	—	不开源
145	Qwen2.5-32B	0.00	51.20	88.40	320B	免费商用
146	Llama3.1-8B-Instruct	0.00	0.00	66.50	80B	免费商用
147	C4AI Aya Vision 32B	0.00	0.00	62.20	320B	不可商用
148	Qwen2.5-72B	0.00	0.00	59.10	727B	免费商用
149	Qwen2.5-7B	0.00	0.00	57.90	70B	免费商用
150	Moonlight-16B-A3B-Instruct	0.00	0.00	48.10	160B	免费商用
151	Qwen2.5-3B	0.00	0.00	42.10	30B	免费商用
152	Gemma 2 - 9B	0.00	0.00	37.80	90B	免费商用
153	Llama3.1-8B	0.00	0.00	33.50	80B	免费商用
154	Mistral-7B-Instruct-v0.3	0.00	0.00	29.30	70B	免费商用
155	Llama-3.2-3B	0.00	0.00	28.00	32B	免费商用
156	Claude Opus 4.5	0.00	87.00	0.00	—	不开源
157	Grok-3 - Reasoning Beta	0.00	79.40	0.00	—	不开源
158	DeepSeek-V3.1 Terminus	0.00	80.00	0.00	6710B	免费商用
159	Grok 4 Fast	0.00	80.00	0.00	—	不开源
160	Gemini 2.5 Pro Deep Think	0.00	80.40	0.00	—	不开源
161	Grok 4.1 Fast	0.00	82.00	0.00	—	不开源
162	GLM-4.6	0.00	82.80	0.00	3550B	免费商用
163	QwQ-32B	0.00	0.00	19.00	325B	免费商用
164	Kimi K2 Thinking	0.00	83.10	0.00	10400B	免费商用
165	Qwen3.5-397B-A17B	0.00	83.60	0.00	397B	免费商用
166	GLM-4.7	0.00	84.90	0.00	3580B	免费商用
167	Gemini 2.5-Pro	0.00	77.10	0.00	—	不开源
168	Gemini 2.5 Deep Think	0.00	87.60	0.00	—	不开源
169	OpenAI o1-mini	0.00	52.00	92.40	—	不开源
170	Claude 3.5 Sonnet	0.00	0.00	92.00	—	不开源
171	Hunyuan-TurboS	0.00	32.00	91.00	—	不开源
172	GPT-4o(2024-11-20)	0.00	0.00	90.20	—	不开源
173	Gemini 1.5 Pro	0.00	0.00	89.00	—	不开源
174	Llama3.1-405B Instruct	0.00	30.20	89.00	4050B	免费商用
175	Amazon Nova Pro	0.00	0.00	89.00	—	不开源
176	DeepSeek-V3	0.00	34.60	89.00	6810B	免费商用
177	Mistral-Small-3.1-24B-Instruct-2503	0.00	0.00	88.41	240B	免费商用

Claude Sonnet 5

SWE-bench Verified82.00

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Sonnet 4.5

SWE-bench Verified82.00

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Opus 4.5

SWE-bench Verified80.90

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Opus 4.6

SWE-bench Verified80.80

LiveCodeBench0.00

HumanEval0.00

不开源

Gemini 3.1 Pro Preview

SWE-bench Verified80.60

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Sonnet 4

SWE-bench Verified80.20

LiveCodeBench0.00

HumanEval0.00

不开源

MiniMax M2.5

2290B

SWE-bench Verified80.20

LiveCodeBench0.00

HumanEval0.00

免费商用

GPT-5.2

SWE-bench Verified80.00

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Sonnet 4.6

SWE-bench Verified79.60

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Opus 4.1

SWE-bench Verified79.40

LiveCodeBench0.00

HumanEval0.00

不开源

GLM-5

7440B

SWE-bench Verified77.80

LiveCodeBench0.00

HumanEval0.00

免费商用

Claude Sonnet 4.5

SWE-bench Verified77.20

LiveCodeBench0.00

HumanEval0.00

不开源

GPT-5.1-Codex-Max

SWE-bench Verified76.80

LiveCodeBench0.00

HumanEval0.00

不开源

Kimi K2.5

10000B

SWE-bench Verified76.80

LiveCodeBench85.00

HumanEval0.00

免费商用

Qwen3.5-397B-A17B

397B

SWE-bench Verified76.40

LiveCodeBench0.00

HumanEval0.00

免费商用

GPT-5.1

SWE-bench Verified76.30

LiveCodeBench0.00

HumanEval0.00

不开源

Gemini 3.0 Pro (Preview 11-2025)

SWE-bench Verified76.20

LiveCodeBench92.00

HumanEval0.00

不开源

Qwen3-Max-Thinking

10000B

SWE-bench Verified75.30

LiveCodeBench85.90

HumanEval0.00

不开源

o3-pro

SWE-bench Verified75.00

LiveCodeBench0.00

HumanEval0.00

不开源

M2.1

2300B

SWE-bench Verified74.80

LiveCodeBench0.00

HumanEval0.00

免费商用

Claude Opus 4.1

SWE-bench Verified74.50

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Opus 4.1

SWE-bench Verified74.50

LiveCodeBench65.00

HumanEval0.00

不开源

GPT-5 Codex

SWE-bench Verified74.50

LiveCodeBench0.00

HumanEval0.00

不开源

Step 3.5 Flash

1960B

SWE-bench Verified74.40

LiveCodeBench86.40

HumanEval0.00

免费商用

GLM-4.7

3580B

SWE-bench Verified73.80

LiveCodeBench0.00

HumanEval0.00

免费商用

Grok 4 Heavy

SWE-bench Verified73.50

LiveCodeBench0.00

HumanEval0.00

不开源

Haiku 4.5

SWE-bench Verified73.30

LiveCodeBench0.00

HumanEval0.00

不开源

DeepSeek V3.2

6710B

SWE-bench Verified73.10

LiveCodeBench0.00

HumanEval0.00

免费商用

GPT-5

SWE-bench Verified72.80

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Sonnet 4

SWE-bench Verified72.70

LiveCodeBench0.00

HumanEval0.00

不开源

Claude Opus 4

SWE-bench Verified72.50

LiveCodeBench56.60

HumanEval0.00

不开源

Grok 4 Code

SWE-bench Verified72.00

LiveCodeBench0.00

HumanEval0.00

不开源

Kimi K2 Thinking

10400B

SWE-bench Verified71.30

LiveCodeBench0.00

HumanEval0.00

免费商用

Grok Code Fast 1

SWE-bench Verified70.80

LiveCodeBench0.00

HumanEval0.00

不开源

Qwen3-Coder-Next

80B

SWE-bench Verified70.60

LiveCodeBench0.00

HumanEval0.00

免费商用

GPT-5.1 Codex

SWE-bench Verified70.40

LiveCodeBench85.50

HumanEval0.00

不开源

Claude Sonnet 3.7

SWE-bench Verified70.30

LiveCodeBench0.00

HumanEval0.00

不开源

DeepSeek V3.2

6710B

SWE-bench Verified70.20

LiveCodeBench83.30

HumanEval0.00

免费商用

Qwen3 Max (Preview)

SWE-bench Verified69.60

LiveCodeBench57.50

HumanEval0.00

不开源

MiniMax M2

2300B

SWE-bench Verified69.40

LiveCodeBench0.00

HumanEval0.00

免费商用

Kimi K2 0905

10000B

SWE-bench Verified69.20

LiveCodeBench0.00

HumanEval0.00

免费商用

Kimi K2 0905

10000B

SWE-bench Verified69.20

LiveCodeBench0.00

HumanEval0.00

免费商用

OpenAI o3

SWE-bench Verified69.10

LiveCodeBench0.00

HumanEval0.00

不开源

Gemini 3.0 Flash

SWE-bench Verified68.70

LiveCodeBench0.00

HumanEval0.00

不开源

DeepSeek-V3.1 Terminus

6710B

SWE-bench Verified68.40

LiveCodeBench74.90

HumanEval0.00

免费商用

OpenAI o4 - mini

SWE-bench Verified68.10

LiveCodeBench0.00

HumanEval0.00

不开源

GLM-4.6

3550B

SWE-bench Verified68.00

LiveCodeBench56.00

HumanEval0.00

免费商用

GLM-4.6

3550B

SWE-bench Verified68.00

LiveCodeBench84.50

HumanEval0.00

免费商用

DeepSeek V3.2-Exp

6710B

SWE-bench Verified67.80

LiveCodeBench0.00

HumanEval0.00

免费商用

Gemini 2.5-Pro

SWE-bench Verified67.20

LiveCodeBench0.00

HumanEval0.00

不开源

Qwen3-Coder-480B-A35B

4800B

SWE-bench Verified67.00

LiveCodeBench0.00

HumanEval0.00

免费商用

DeepSeek-V3.1

6710B

SWE-bench Verified66.00

LiveCodeBench56.40

HumanEval0.00

免费商用

GLM-4.5

3550B

SWE-bench Verified64.20

LiveCodeBench72.90

HumanEval0.00

免费商用

Gemini 2.5 Pro Experimental 03-25

SWE-bench Verified63.80

LiveCodeBench70.40

HumanEval0.00

不开源

Gemini-2.5-Pro-Preview-05-06

SWE-bench Verified63.20

LiveCodeBench77.10

HumanEval0.00

不开源

Claude Sonnet 3.7

SWE-bench Verified62.30

LiveCodeBench0.00

HumanEval0.00

不开源

Devstral Medium

SWE-bench Verified61.60

LiveCodeBench0.00

HumanEval0.00

不开源

Haiku 4.5

SWE-bench Verified60.60

LiveCodeBench51.00

HumanEval0.00

不开源

GPT OSS 120B

117B

SWE-bench Verified60.10

LiveCodeBench0.00

HumanEval0.00

免费商用

GLM-4.7-Flash

310B

SWE-bench Verified59.20

LiveCodeBench0.00

HumanEval0.00

免费商用

Grok 4

SWE-bench Verified58.60

LiveCodeBench82.00

HumanEval0.00

不开源

DeepSeek-R1-0528

6710B

SWE-bench Verified57.60

LiveCodeBench73.30

HumanEval0.00

免费商用

GLM-4.5-Air

1060B

SWE-bench Verified57.60

LiveCodeBench70.70

HumanEval0.00

免费商用

MiniMax-M1-80k

4560B

SWE-bench Verified56.00

LiveCodeBench65.00

HumanEval0.00

免费商用

MiniMax-M1-40k

4560B

SWE-bench Verified55.60

LiveCodeBench62.30

HumanEval0.00

免费商用

GPT-4.1

SWE-bench Verified54.60

LiveCodeBench40.50

HumanEval0.00

不开源

Gemini 2.5 Flash-Preview-09-2025

SWE-bench Verified54.00

LiveCodeBench0.00

HumanEval0.00

不开源

Devstral Small 1.1

240B

SWE-bench Verified53.60

LiveCodeBench0.00

HumanEval0.00

免费商用

Kimi K2

10000B

SWE-bench Verified51.80

LiveCodeBench53.70

HumanEval0.00

免费商用

Qwen3-Coder-Flash

305B

SWE-bench Verified51.60

LiveCodeBench0.00

HumanEval0.00

免费商用

Gemini 2.5 Flash

SWE-bench Verified50.00

LiveCodeBench41.10

HumanEval0.00

不开源

OpenAI o3-mini (high)

SWE-bench Verified49.30

LiveCodeBench69.50

HumanEval97.60

不开源

DeepSeek-R1

6710B

SWE-bench Verified49.20

LiveCodeBench65.90

HumanEval0.00

免费商用

Claude 3.5 Sonnet New

SWE-bench Verified49.00

LiveCodeBench38.70

HumanEval93.70

不开源

OpenAI o1

SWE-bench Verified48.90

LiveCodeBench71.00

HumanEval0.00

不开源

Gemini 2.5 Flash

SWE-bench Verified48.90

LiveCodeBench55.40

HumanEval0.00

不开源

Devstral Small 1.0

240B

SWE-bench Verified46.80

LiveCodeBench0.00

HumanEval0.00

免费商用

OpenAI o3-mini

SWE-bench Verified40.80

LiveCodeBench0.00

HumanEval0.00

不开源

DeepSeek-V3-0324

6710B

SWE-bench Verified38.80

LiveCodeBench49.20

HumanEval0.00

免费商用

GPT-4.5

SWE-bench Verified38.00

LiveCodeBench46.40

HumanEval0.00

不开源

Qwen3-235B-A22B

2350B

SWE-bench Verified34.40

LiveCodeBench70.70

HumanEval0.00

免费商用

GPT OSS 20B

210B

SWE-bench Verified34.00

LiveCodeBench0.00

HumanEval0.00

免费商用

GPT-4o

SWE-bench Verified31.00

LiveCodeBench35.10

HumanEval90.00

不开源

Gemini 2.5 Flash-Lite

SWE-bench Verified27.60

LiveCodeBench34.30

HumanEval0.00

不开源

GPT-4.1 mini

SWE-bench Verified23.60

LiveCodeBench0.00

HumanEval0.00

不开源

Qwen3-30B-A3B-2507

305B

SWE-bench Verified22.00

LiveCodeBench0.00

HumanEval0.00

免费商用

Gemini 2.0 Flash Experimental

SWE-bench Verified21.40

LiveCodeBench29.10

HumanEval0.00

不开源

Kimi-k1.6-IOI-high

SWE-bench Verified0.00

LiveCodeBench73.80

HumanEval0.00

不开源

MiniMax M2

2300B

SWE-bench Verified0.00

LiveCodeBench83.00

HumanEval0.00

免费商用

QwQ-Max-Preview

SWE-bench Verified0.00

LiveCodeBench65.60

HumanEval0.00

免费商用

Qwen3-32B

320B

SWE-bench Verified0.00

LiveCodeBench65.70

HumanEval0.00

免费商用

Kimi-k1.6-IOI

SWE-bench Verified0.00

LiveCodeBench65.90

HumanEval0.00

不开源

Claude Sonnet 4

SWE-bench Verified0.00

LiveCodeBench66.00

HumanEval0.00

不开源

DeepSeek-V3.1

6710B

SWE-bench Verified0.00

LiveCodeBench74.80

HumanEval0.00

免费商用

Qwen3-235B-A22B-Thinking-2507

2350B

SWE-bench Verified0.00

LiveCodeBench74.10

HumanEval0.00

免费商用

Qwen3-235B-A22B-Thinking

305B

SWE-bench Verified0.00

LiveCodeBench74.10

HumanEval0.00

免费商用

DeepSeek V3.2-Exp

6710B

SWE-bench Verified0.00

LiveCodeBench74.10

HumanEval0.00

免费商用

Pangu Embedded

70B

SWE-bench Verified0.00

LiveCodeBench67.10

HumanEval0.00

免费商用

Claude Sonnet 4.5

SWE-bench Verified0.00

LiveCodeBench71.00

HumanEval0.00

不开源

100

Qwen3-235B-A22B

2350B

SWE-bench Verified0.00

LiveCodeBench70.70

HumanEval0.00

免费商用

101

Grok 3

SWE-bench Verified0.00

LiveCodeBench70.60

HumanEval0.00

不开源

102

OpenAI o3-mini (medium)

SWE-bench Verified0.00

LiveCodeBench67.40

HumanEval0.00

不开源

103

OpenAI o3

SWE-bench Verified0.00

LiveCodeBench75.80

HumanEval0.00

不开源

104

Step3

3210B

SWE-bench Verified0.00

LiveCodeBench67.10

HumanEval0.00

免费商用

105

GLM-4-9B-Chat

90B

SWE-bench Verified0.00

LiveCodeBench51.80

HumanEval0.00

免费商用

106

Gemma 3 - 12B (IT)

120B

SWE-bench Verified0.00

LiveCodeBench24.60

HumanEval0.00

免费商用

107

Gemini 2.0 Flash-Lite

SWE-bench Verified0.00

LiveCodeBench28.90

HumanEval0.00

不开源

108

Qwen3-30B-A3B

305B

SWE-bench Verified0.00

LiveCodeBench29.00

HumanEval0.00

免费商用

109

Llama 4 Scout Instruct

1090B

SWE-bench Verified0.00

LiveCodeBench32.80

HumanEval0.00

免费商用

110

Qwen3-4B-2507

40B

SWE-bench Verified0.00

LiveCodeBench35.10

HumanEval0.00

免费商用

111

GPT-4o(2025-03-27)

SWE-bench Verified0.00

LiveCodeBench35.80

HumanEval0.00

不开源

112

ERNIE-4.5-300B-A47B

3000B

SWE-bench Verified0.00

LiveCodeBench38.80

HumanEval0.00

免费商用

113

ERNIE-4.5-VL-424B-A47B-Base

4240B

SWE-bench Verified0.00

LiveCodeBench38.80

HumanEval0.00

免费商用

114

Qwen3-30B-A3B-2507

305B

SWE-bench Verified0.00

LiveCodeBench43.20

HumanEval0.00

免费商用

115

Llama 4 Maverick Instruct

4000B

SWE-bench Verified0.00

LiveCodeBench43.40

HumanEval0.00

免费商用

116

Claude Sonnet 4

SWE-bench Verified0.00

LiveCodeBench48.50

HumanEval0.00

不开源

117

Llama 4 Behemoth Instruct

20000B

SWE-bench Verified0.00

LiveCodeBench49.40

HumanEval0.00

免费商用

118

Qwen3-235B-A22B-2507

2350B

SWE-bench Verified0.00

LiveCodeBench51.80

HumanEval0.00

免费商用

119

Hunyuan-T1

SWE-bench Verified0.00

LiveCodeBench64.90

HumanEval0.00

不开源

120

DeepSeek V3.2-Exp

6710B

SWE-bench Verified0.00

LiveCodeBench55.00

HumanEval0.00

免费商用

121

GPT-5-mini

SWE-bench Verified0.00

LiveCodeBench55.00

HumanEval0.00

不开源

122

Qwen3-4B-Thinking-2507

40B

SWE-bench Verified0.00

LiveCodeBench55.20

HumanEval0.00

免费商用

123

Magistral-Small-2506

240B

SWE-bench Verified0.00

LiveCodeBench55.84

HumanEval0.00

免费商用

124

Qwen3-Next

800B

SWE-bench Verified0.00

LiveCodeBench56.60

HumanEval0.00

免费商用

125

Hunyuan-7B

70B

SWE-bench Verified0.00

LiveCodeBench57.00

HumanEval0.00

免费商用

126

Qwen3-8B

80B

SWE-bench Verified0.00

LiveCodeBench57.50

HumanEval0.00

免费商用

127

Claude Sonnet 4.5

SWE-bench Verified0.00

LiveCodeBench59.00

HumanEval0.00

不开源

128

Magistral-Medium-2506

SWE-bench Verified0.00

LiveCodeBench59.36

HumanEval0.00

不开源

129

Pangu Pro MoE

719B

SWE-bench Verified0.00

LiveCodeBench59.60

HumanEval0.00

免费商用

130

Qwen3-8B

80B

SWE-bench Verified0.00

LiveCodeBench61.80

HumanEval0.00

免费商用

131

Haiku 4.5

SWE-bench Verified0.00

LiveCodeBench62.00

HumanEval0.00

不开源

132

Hunyuan-A13B-Instruct

800B

SWE-bench Verified0.00

LiveCodeBench63.90

HumanEval0.00

免费商用

133

Qwen2.5-Max

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval73.20

不开源

134

Llama3.3-70B-Instruct

700B

SWE-bench Verified0.00

LiveCodeBench33.30

HumanEval88.40

免费商用

135

Grok 2

2690B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval88.40

免费商用

136

Claude 3.5 Haiku

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval88.10

不开源

137

Gemma 3 - 27B (IT)

270B

SWE-bench Verified0.00

LiveCodeBench29.70

HumanEval87.80

免费商用

138

GPT-4o mini

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval87.20

不开源

139

Codestral 25.01

SWE-bench Verified0.00

LiveCodeBench37.90

HumanEval86.60

不开源

140

Claude3-Opus

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval84.90

不开源

141

Codestral

220B

SWE-bench Verified0.00

LiveCodeBench31.50

HumanEval81.10

不可商用

142

Llama3.1-70B-Instruct

700B

SWE-bench Verified0.00

LiveCodeBench33.30

HumanEval80.50

免费商用

143

Phi-4-mini-instruct (3.8B)

38B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval74.40

免费商用

144

Grok-1.5

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval74.10

不开源

145

Qwen2.5-32B

320B

SWE-bench Verified0.00

LiveCodeBench51.20

HumanEval88.40

免费商用

146

Llama3.1-8B-Instruct

80B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval66.50

免费商用

147

C4AI Aya Vision 32B

320B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval62.20

不可商用

148

Qwen2.5-72B

727B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval59.10

免费商用

149

Qwen2.5-7B

70B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval57.90

免费商用

150

Moonlight-16B-A3B-Instruct

160B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval48.10

免费商用

151

Qwen2.5-3B

30B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval42.10

免费商用

152

Gemma 2 - 9B

90B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval37.80

免费商用

153

Llama3.1-8B

80B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval33.50

免费商用

154

Mistral-7B-Instruct-v0.3

70B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval29.30

免费商用

155

Llama-3.2-3B

32B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval28.00

免费商用

156

Claude Opus 4.5

SWE-bench Verified0.00

LiveCodeBench87.00

HumanEval0.00

不开源

157

Grok-3 - Reasoning Beta

SWE-bench Verified0.00

LiveCodeBench79.40

HumanEval0.00

不开源

158

DeepSeek-V3.1 Terminus

6710B

SWE-bench Verified0.00

LiveCodeBench80.00

HumanEval0.00

免费商用

159

Grok 4 Fast

SWE-bench Verified0.00

LiveCodeBench80.00

HumanEval0.00

不开源

160

Gemini 2.5 Pro Deep Think

SWE-bench Verified0.00

LiveCodeBench80.40

HumanEval0.00

不开源

161

Grok 4.1 Fast

SWE-bench Verified0.00

LiveCodeBench82.00

HumanEval0.00

不开源

162

GLM-4.6

3550B

SWE-bench Verified0.00

LiveCodeBench82.80

HumanEval0.00

免费商用

163

QwQ-32B

325B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval19.00

免费商用

164

Kimi K2 Thinking

10400B

SWE-bench Verified0.00

LiveCodeBench83.10

HumanEval0.00

免费商用

165

Qwen3.5-397B-A17B

397B

SWE-bench Verified0.00

LiveCodeBench83.60

HumanEval0.00

免费商用

166

GLM-4.7

3580B

SWE-bench Verified0.00

LiveCodeBench84.90

HumanEval0.00

免费商用

167

Gemini 2.5-Pro

SWE-bench Verified0.00

LiveCodeBench77.10

HumanEval0.00

不开源

168

Gemini 2.5 Deep Think

SWE-bench Verified0.00

LiveCodeBench87.60

HumanEval0.00

不开源

169

OpenAI o1-mini

SWE-bench Verified0.00

LiveCodeBench52.00

HumanEval92.40

不开源

170

Claude 3.5 Sonnet

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval92.00

不开源

171

Hunyuan-TurboS

SWE-bench Verified0.00

LiveCodeBench32.00

HumanEval91.00

不开源

172

GPT-4o(2024-11-20)

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval90.20

不开源

173

Gemini 1.5 Pro

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval89.00

不开源

174

Llama3.1-405B Instruct

4050B

SWE-bench Verified0.00

LiveCodeBench30.20

HumanEval89.00

免费商用

175

Amazon Nova Pro

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval89.00

不开源

176

DeepSeek-V3

6810B

SWE-bench Verified0.00

LiveCodeBench34.60

HumanEval89.00

免费商用

177

Mistral-Small-3.1-24B-Instruct-2503

240B

SWE-bench Verified0.00

LiveCodeBench0.00

HumanEval88.41

免费商用