LLM Coding Benchmark Leaderboard

Name: LLM Coding Benchmark Leaderboard
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

This page provides the LLM coding benchmark leaderboard, covering SWE-Bench Verified, SWE-Bench Pro, LiveCodeBench, and SWE-bench Multilingual datasets, comparing GPT, Claude, Qwen, and DeepSeek models.

Updated on 2026-05-02 07:10:24

As of 2026-05, this page covers SWE-bench Verified, LiveCodeBench, SWE-Bench Pro - Public, SWE-bench Multilingual and related benchmarks for LLM Coding Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Reference: Composite Coding Rankings

There is no single, universally accepted coding leaderboard. Static benchmarks like SWE-bench and HumanEval measure specific skills but can be gamed through targeted fine-tuning. We selected two complementary human-preference leaderboards: LMArena Coding Arena ranks models on general programming tasks (debugging, algorithms, code generation) via anonymous crowd-sourced voting; DesignArena Code Category focuses specifically on visual, front-end code generation (websites, UI components, games) using the same blind-voting methodology. Reading both together gives a fuller picture of coding capability.

LMArena Coding Arena

Full ranking

Elo ratings from anonymous A/B voting on real general coding tasks (debugging, algorithms, code generation) submitted by developers.

Updated 2026-05-07

#ModelElo

Opus 4.7 (thinking)Anthropic

1569

Claude Opus 4.6 (thinking)Anthropic

1553

Opus 4.7Anthropic

1550

Claude Opus 4.6Anthropic

1550

Claude Opus 4 (thinking-32k)Anthropic

1531

Muse SparkFacebook AI研究实验室

1530

Gemini 3.1 Pro PreviewGoogle Deep Mind

1529

gpt-5.4-highOpenAI

1528

智

GLM 5.1智谱AI

1525

gpt-5.5-highOpenAI

1524

Source: LMArena

DesignArena Code Category

Full ranking

Elo ratings from anonymous voting on visual front-end code tasks (websites, UI components, games, data viz) by Arcada Labs.

Updated 2026-05-10

#ModelElo

Claude Opus 4.7 (Thinking)Anthropic

1350

Claude Opus 4.6Anthropic

1346

Claude Opus 4.6 (Thinking)Anthropic

1344

Kimi K2.6Moonshot AI

1343

GLM 5.1Zhipu AI

1341

Opus 4.7Anthropic

1338

GLM 5 TurboZhipu AI

1336

Claude Sonnet 4.6Anthropic

1331

GPT-5.5OpenAI

1314

DeepSeek-V4-ProDeepSeek-AI

1313

Source: DesignArena

Benchmark

SWE-bench Verified LiveCodeBench SWE-Bench Pro - Public SWE-bench Multilingual

More Benchmarks

Model Size:All 3B and below 7B 13B 34B 65B 100B and above

Model Type:All Reasoning Models Foundation Models Instruction/Chat Models Coding Models

Source:All Open Source Closed Source

Origin:All China

Model release cutoff:

LLM Performance Results

Data source: DataLearnerAI

Rank	Model					License
	DeepSeek-V4-Pro Thinking Level · Extra HighTools DeepSeek-AI	80.60	—	55.40	76.20	Free commercial
	MiniMax M2.5 Thinking EnabledTools MiniMaxAI	80.20	—	55.40	—	Free commercial
	Kimi K2.6 Thinking EnabledTools Moonshot AI	80.20	—	58.60	76.70	Free commercial
4	DeepSeek-V4-Pro Thinking Level · HighTools DeepSeek-AI	79.40	—	54.40	74.10	Free commercial
5	DeepSeek-V4-Flash Thinking Level · Extra HighTools DeepSeek-AI	79.00	—	52.60	73.30	Free commercial
6	DeepSeek-V4-Flash Thinking Level · HighTools DeepSeek-AI	78.60	—	52.30	70.20	Free commercial
7	GLM-5 Thinking Enabled 智谱AI	77.80	—	—	—	Free commercial
8	Qwen3.6-27B Thinking EnabledTools 阿里巴巴	77.20	—	53.50	71.30	Free commercial
9	Kimi K2.5 Thinking EnabledTools Moonshot AI	76.80	—	50.70	—	Free commercial
10	Qwen3.5-397B-A17B Thinking EnabledTools 阿里巴巴	76.40	—	—	—	Free commercial
11	M2.1 Thinking Enabled MiniMaxAI	74.80	—	—	—	Free commercial
12	Step 3.5 Flash Thinking Enabled StepFunAI	74.40	86.40	—	—	Free commercial
13	GLM-4.7 Thinking EnabledTools 智谱AI	73.80	—	40.60	—	Free commercial
14	DeepSeek-V4-Flash Standard ModeTools DeepSeek-AI	73.70	—	49.10	69.70	Free commercial
15	DeepSeek-V4-Pro Standard ModeTools DeepSeek-AI	73.60	—	52.10	69.80	Free commercial
16	Qwen3.6-35B-A3B Thinking Enabled 阿里巴巴	73.40	80.40	49.50	67.20	Free commercial
17	DeepSeek V3.2 Thinking EnabledTools DeepSeek-AI	73.10	—	—	—	Free commercial
18	Qwen3.5-27B Thinking Enabled 阿里巴巴	72.40	—	—	—	Free commercial
19	Kimi K2 Thinking Thinking EnabledTools Moonshot AI	71.30	—	—	—	Free commercial
20	Qwen3-Coder-Next Standard ModeTools 阿里巴巴	70.60	—	44.30	—	Free commercial
21	DeepSeek V3.2 Thinking Enabled DeepSeek-AI	70.20	83.30	40.90	—	Free commercial
22	MiniMax M2 Thinking EnabledTools MiniMaxAI	69.40	—	—	—	Free commercial
23	Kimi K2 0905 Moonshot AI	69.20	—	27.67	—	Free commercial
24	Kimi K2 0905 Thinking EnabledTools Moonshot AI	69.20	—	—	—	Free commercial
25	DeepSeek-V3.1 Terminus DeepSeek-AI	68.40	74.90	—	—	Free commercial
26	GLM-4.6 智谱AI	68.00	56.00	—	—	Free commercial
27	GLM-4.6 Thinking EnabledTools 智谱AI	68.00	84.50	—	—	Free commercial
28	DeepSeek V3.2-Exp Thinking EnabledTools DeepSeek-AI	67.80	—	—	—	Free commercial
29	Qwen3-Coder-480B-A35B 阿里巴巴	67.00	—	—	—	Free commercial
30	DeepSeek-V3.1 DeepSeek-AI	66.00	56.40	—	—	Free commercial
31	GLM-4.5 Thinking Enabled 智谱AI	64.20	72.90	—	—	Free commercial
32	GPT OSS 120B Thinking Enabled OpenAI	60.10	—	—	—	Free commercial
33	GLM-4.7-Flash Thinking Enabled 智谱AI	59.20	—	—	—	Free commercial
34	DeepSeek-R1-0528 Thinking Enabled DeepSeek-AI	57.60	73.30	—	—	Free commercial
35	GLM-4.5-Air Thinking Enabled 智谱AI	57.60	70.70	—	—	Free commercial
36	MiniMax-M1-80k MiniMaxAI	56.00	65.00	—	—	Free commercial
37	MiniMax-M1-40k MiniMaxAI	55.60	62.30	—	—	Free commercial
38	Devstral Small 1.1 MistralAI	53.60	—	—	—	Free commercial
39	Kimi K2 Moonshot AI	51.80	53.70	—	—	Free commercial
40	Qwen3-Coder-Flash 阿里巴巴	51.60	—	—	—	Free commercial
41	DeepSeek-R1 DeepSeek-AI	49.20	65.90	—	—	Free commercial
42	Devstral Small 1.0 MistralAI	46.80	—	—	—	Free commercial
43	DeepSeek-V3-0324 DeepSeek-AI	38.80	49.20	—	—	Free commercial
44	Qwen3-235B-A22B 阿里巴巴	34.40	70.70	—	—	Free commercial
45	GPT OSS 20B Thinking Enabled OpenAI	34.00	—	—	—	Free commercial
46	Qwen3-30B-A3B-2507 Thinking Enabled 阿里巴巴	22.00	—	—	—	Free commercial
47	DeepSeek-V3 DeepSeek-AI	—	34.60	—	—	Free commercial
48	Hunyuan-7B Tencent ARC	—	57.00	—	—	Free commercial
49	Qwen3-4B-2507 阿里巴巴	—	35.10	—	—	Free commercial
50	ERNIE-4.5-300B-A47B 百度	—	38.80	—	—	Free commercial

DeepSeek-V4-Pro

DeepSeek-AI

Thinking Level · Extra HighTools

SWE-bench Verified80.60

LiveCodeBench—

SWE-Bench Pro - Public55.40

SWE-bench Multilingual76.20

Free commercial

MiniMax M2.5

MiniMaxAI

Thinking EnabledTools

SWE-bench Verified80.20

LiveCodeBench—

SWE-Bench Pro - Public55.40

SWE-bench Multilingual—

Free commercial

Kimi K2.6

Moonshot AI

Thinking EnabledTools

SWE-bench Verified80.20

LiveCodeBench—

SWE-Bench Pro - Public58.60

SWE-bench Multilingual76.70

Free commercial

DeepSeek-V4-Pro

DeepSeek-AI

Thinking Level · HighTools

SWE-bench Verified79.40

LiveCodeBench—

SWE-Bench Pro - Public54.40

SWE-bench Multilingual74.10

Free commercial

DeepSeek-V4-Flash

DeepSeek-AI

Thinking Level · Extra HighTools

SWE-bench Verified79.00

LiveCodeBench—

SWE-Bench Pro - Public52.60

SWE-bench Multilingual73.30

Free commercial

DeepSeek-V4-Flash

DeepSeek-AI

Thinking Level · HighTools

SWE-bench Verified78.60

LiveCodeBench—

SWE-Bench Pro - Public52.30

SWE-bench Multilingual70.20

Free commercial

GLM-5

智谱AI

Thinking Enabled

SWE-bench Verified77.80

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Qwen3.6-27B

阿里巴巴

Thinking EnabledTools

SWE-bench Verified77.20

LiveCodeBench—

SWE-Bench Pro - Public53.50

SWE-bench Multilingual71.30

Free commercial

Kimi K2.5

Moonshot AI

Thinking EnabledTools

SWE-bench Verified76.80

LiveCodeBench—

SWE-Bench Pro - Public50.70

SWE-bench Multilingual—

Free commercial

Qwen3.5-397B-A17B

阿里巴巴

Thinking EnabledTools

SWE-bench Verified76.40

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

M2.1

MiniMaxAI

Thinking Enabled

SWE-bench Verified74.80

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Step 3.5 Flash

StepFunAI

Thinking Enabled

SWE-bench Verified74.40

LiveCodeBench86.40

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

GLM-4.7

智谱AI

Thinking EnabledTools

SWE-bench Verified73.80

LiveCodeBench—

SWE-Bench Pro - Public40.60

SWE-bench Multilingual—

Free commercial

DeepSeek-V4-Flash

DeepSeek-AI

Standard ModeTools

SWE-bench Verified73.70

LiveCodeBench—

SWE-Bench Pro - Public49.10

SWE-bench Multilingual69.70

Free commercial

DeepSeek-V4-Pro

DeepSeek-AI

Standard ModeTools

SWE-bench Verified73.60

LiveCodeBench—

SWE-Bench Pro - Public52.10

SWE-bench Multilingual69.80

Free commercial

Qwen3.6-35B-A3B

阿里巴巴

Thinking Enabled

SWE-bench Verified73.40

LiveCodeBench80.40

SWE-Bench Pro - Public49.50

SWE-bench Multilingual67.20

Free commercial

DeepSeek V3.2

DeepSeek-AI

Thinking EnabledTools

SWE-bench Verified73.10

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Qwen3.5-27B

阿里巴巴

Thinking Enabled

SWE-bench Verified72.40

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Kimi K2 Thinking

Moonshot AI

Thinking EnabledTools

SWE-bench Verified71.30

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Qwen3-Coder-Next

阿里巴巴

Standard ModeTools

SWE-bench Verified70.60

LiveCodeBench—

SWE-Bench Pro - Public44.30

SWE-bench Multilingual—

Free commercial

DeepSeek V3.2

DeepSeek-AI

Thinking Enabled

SWE-bench Verified70.20

LiveCodeBench83.30

SWE-Bench Pro - Public40.90

SWE-bench Multilingual—

Free commercial

MiniMax M2

MiniMaxAI

Thinking EnabledTools

SWE-bench Verified69.40

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Kimi K2 0905

Moonshot AI

SWE-bench Verified69.20

LiveCodeBench—

SWE-Bench Pro - Public27.67

SWE-bench Multilingual—

Free commercial

Kimi K2 0905

Moonshot AI

Thinking EnabledTools

SWE-bench Verified69.20

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

DeepSeek-V3.1 Terminus

DeepSeek-AI

SWE-bench Verified68.40

LiveCodeBench74.90

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

GLM-4.6

智谱AI

SWE-bench Verified68.00

LiveCodeBench56.00

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

GLM-4.6

智谱AI

Thinking EnabledTools

SWE-bench Verified68.00

LiveCodeBench84.50

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

DeepSeek V3.2-Exp

DeepSeek-AI

Thinking EnabledTools

SWE-bench Verified67.80

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Qwen3-Coder-480B-A35B

阿里巴巴

SWE-bench Verified67.00

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

DeepSeek-V3.1

DeepSeek-AI

SWE-bench Verified66.00

LiveCodeBench56.40

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

GLM-4.5

智谱AI

Thinking Enabled

SWE-bench Verified64.20

LiveCodeBench72.90

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

GPT OSS 120B

OpenAI

Thinking Enabled

SWE-bench Verified60.10

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

GLM-4.7-Flash

智谱AI

Thinking Enabled

SWE-bench Verified59.20

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

DeepSeek-R1-0528

DeepSeek-AI

Thinking Enabled

SWE-bench Verified57.60

LiveCodeBench73.30

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

GLM-4.5-Air

智谱AI

Thinking Enabled

SWE-bench Verified57.60

LiveCodeBench70.70

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

MiniMax-M1-80k

MiniMaxAI

SWE-bench Verified56.00

LiveCodeBench65.00

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

MiniMax-M1-40k

MiniMaxAI

SWE-bench Verified55.60

LiveCodeBench62.30

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Devstral Small 1.1

MistralAI

SWE-bench Verified53.60

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Kimi K2

Moonshot AI

SWE-bench Verified51.80

LiveCodeBench53.70

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Qwen3-Coder-Flash

阿里巴巴

SWE-bench Verified51.60

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

DeepSeek-R1

DeepSeek-AI

SWE-bench Verified49.20

LiveCodeBench65.90

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Devstral Small 1.0

MistralAI

SWE-bench Verified46.80

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

DeepSeek-V3-0324

DeepSeek-AI

SWE-bench Verified38.80

LiveCodeBench49.20

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Qwen3-235B-A22B

阿里巴巴

SWE-bench Verified34.40

LiveCodeBench70.70

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

GPT OSS 20B

OpenAI

Thinking Enabled

SWE-bench Verified34.00

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Qwen3-30B-A3B-2507

阿里巴巴

Thinking Enabled

SWE-bench Verified22.00

LiveCodeBench—

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

DeepSeek-V3

DeepSeek-AI

SWE-bench Verified—

LiveCodeBench34.60

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Hunyuan-7B

Tencent ARC

SWE-bench Verified—

LiveCodeBench57.00

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Qwen3-4B-2507

阿里巴巴

SWE-bench Verified—

LiveCodeBench35.10

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

ERNIE-4.5-300B-A47B

百度

SWE-bench Verified—

LiveCodeBench38.80

SWE-Bench Pro - Public—

SWE-bench Multilingual—

Free commercial

Sort by:

Showing 50 of 104 modelsView SWE-bench Verified benchmark page

Reference: Composite Coding Rankings

Rank

Model

License

DeepSeek-V4-Pro

Thinking Level · Extra HighTools

DeepSeek-AI

80.60

—

55.40

76.20

Free commercial

MiniMax M2.5

Thinking EnabledTools

MiniMaxAI

80.20

—

55.40

—

Free commercial

Kimi K2.6

Thinking EnabledTools

Moonshot AI

80.20

—

58.60

76.70

Free commercial

DeepSeek-V4-Pro

Thinking Level · HighTools

DeepSeek-AI

79.40

—

54.40

74.10

Free commercial

DeepSeek-V4-Flash

Thinking Level · Extra HighTools

DeepSeek-AI

79.00

—

52.60

73.30

Free commercial

DeepSeek-V4-Flash

Thinking Level · HighTools

DeepSeek-AI

78.60

—

52.30

70.20

Free commercial

GLM-5

Thinking Enabled

智谱AI

77.80

—

Free commercial

Qwen3.6-27B

Thinking EnabledTools

阿里巴巴

77.20

—

53.50

71.30

Free commercial

Kimi K2.5

Thinking EnabledTools

Moonshot AI

76.80

—

50.70

—

Free commercial

Qwen3.5-397B-A17B

Thinking EnabledTools

阿里巴巴

76.40

—

Free commercial

M2.1

Thinking Enabled

MiniMaxAI

74.80

—

Free commercial

Step 3.5 Flash

Thinking Enabled

StepFunAI

74.40

86.40

—

Free commercial

GLM-4.7

Thinking EnabledTools

智谱AI

73.80

—

40.60

—

Free commercial

DeepSeek-V4-Flash

Standard ModeTools

DeepSeek-AI

73.70

—

49.10

69.70

Free commercial

DeepSeek-V4-Pro

Standard ModeTools

DeepSeek-AI

73.60

—

52.10

69.80

Free commercial

Qwen3.6-35B-A3B

Thinking Enabled

阿里巴巴

73.40

80.40

49.50

67.20

Free commercial

DeepSeek V3.2

Thinking EnabledTools

DeepSeek-AI

73.10

—

Free commercial

Qwen3.5-27B

Thinking Enabled

阿里巴巴

72.40

—

Free commercial

Kimi K2 Thinking

Thinking EnabledTools

Moonshot AI

71.30

—

Free commercial

Qwen3-Coder-Next

Standard ModeTools

阿里巴巴

70.60

—

44.30

—

Free commercial

DeepSeek V3.2

Thinking Enabled

DeepSeek-AI

70.20

83.30

40.90

—

Free commercial

MiniMax M2

Thinking EnabledTools

MiniMaxAI

69.40

—

Free commercial

Kimi K2 0905

Moonshot AI

69.20

—

27.67

—

Free commercial

Kimi K2 0905

Thinking EnabledTools

Moonshot AI

69.20

—

Free commercial

DeepSeek-V3.1 Terminus

DeepSeek-AI

68.40

74.90

—

Free commercial

GLM-4.6

智谱AI

68.00

56.00

—

Free commercial

GLM-4.6

Thinking EnabledTools

智谱AI

68.00

84.50

—

Free commercial

DeepSeek V3.2-Exp

Thinking EnabledTools

DeepSeek-AI

67.80

—

Free commercial

Qwen3-Coder-480B-A35B

阿里巴巴

67.00

—

Free commercial

DeepSeek-V3.1

DeepSeek-AI

66.00

56.40

—

Free commercial

GLM-4.5

Thinking Enabled

智谱AI

64.20

72.90

—

Free commercial

GPT OSS 120B

Thinking Enabled

OpenAI

60.10

—

Free commercial

GLM-4.7-Flash

Thinking Enabled

智谱AI

59.20

—

Free commercial

DeepSeek-R1-0528

Thinking Enabled

DeepSeek-AI

57.60

73.30

—

Free commercial

GLM-4.5-Air

Thinking Enabled

智谱AI

57.60

70.70

—

Free commercial

MiniMax-M1-80k

MiniMaxAI

56.00

65.00

—

Free commercial

MiniMax-M1-40k

MiniMaxAI

55.60

62.30

—

Free commercial

Devstral Small 1.1

MistralAI

53.60

—

Free commercial

Kimi K2

Moonshot AI

51.80

53.70

—

Free commercial

Qwen3-Coder-Flash

阿里巴巴

51.60

—

Free commercial

DeepSeek-R1

DeepSeek-AI

49.20

65.90

—

Free commercial

Devstral Small 1.0

MistralAI

46.80

—

Free commercial

DeepSeek-V3-0324

DeepSeek-AI

38.80

49.20

—

Free commercial

Qwen3-235B-A22B

阿里巴巴

34.40

70.70

—

Free commercial

GPT OSS 20B

Thinking Enabled

OpenAI

34.00

—

Free commercial

Qwen3-30B-A3B-2507

Thinking Enabled

阿里巴巴

22.00

—

Free commercial

DeepSeek-V3

DeepSeek-AI

—

34.60

—

Free commercial

Hunyuan-7B

Tencent ARC

—

57.00

—

Free commercial

Qwen3-4B-2507

阿里巴巴

—

35.10

—

Free commercial

ERNIE-4.5-300B-A47B

百度

—

38.80

—

Free commercial