DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Coding Benchmark Leaderboard

This page provides the LLM coding benchmark leaderboard, covering SWE-Bench Verified, SWE-Bench Pro, LiveCodeBench, and SWE-bench Multilingual datasets, comparing GPT, Claude, Qwen, and DeepSeek models.

Updated on 2026-05-02 07:10:24

As of 2026-05, this page covers SWE-bench Verified, LiveCodeBench, SWE-Bench Pro - Public, SWE-bench Multilingual and related benchmarks for LLM Coding Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Reference: Composite Coding Rankings

There is no single, universally accepted coding leaderboard. Static benchmarks like SWE-bench and HumanEval measure specific skills but can be gamed through targeted fine-tuning. We selected two complementary human-preference leaderboards: LMArena Coding Arena ranks models on general programming tasks (debugging, algorithms, code generation) via anonymous crowd-sourced voting; DesignArena Code Category focuses specifically on visual, front-end code generation (websites, UI components, games) using the same blind-voting methodology. Reading both together gives a fuller picture of coding capability.

LMArena Coding Arena

Full ranking

Elo ratings from anonymous A/B voting on real general coding tasks (debugging, algorithms, code generation) submitted by developers.

Updated 2026-05-07

#ModelElo
1
Anthropic
Opus 4.7 (thinking)Anthropic
1569
2
Anthropic
Claude Opus 4.6 (thinking)Anthropic
1553
3
Anthropic
Opus 4.7Anthropic
1550
4
Anthropic
Claude Opus 4.6Anthropic
1550
5
Anthropic
Claude Opus 4 (thinking-32k)Anthropic
1531
6
F
Muse SparkFacebook AI研究实验室
1530
7
Google Deep Mind
Gemini 3.1 Pro PreviewGoogle Deep Mind
1529
8
OpenAI
gpt-5.4-highOpenAI
1528
9
智
GLM 5.1智谱AI
1525
10
OpenAI
gpt-5.5-highOpenAI
1524
Source: LMArena

DesignArena Code Category

Full ranking

Elo ratings from anonymous voting on visual front-end code tasks (websites, UI components, games, data viz) by Arcada Labs.

Updated 2026-05-10

#ModelElo
1
Anthropic
Claude Opus 4.7 (Thinking)Anthropic
1350
2
Anthropic
Claude Opus 4.6Anthropic
1346
3
Anthropic
Claude Opus 4.6 (Thinking)Anthropic
1344
4
Moonshot AI
Kimi K2.6Moonshot AI
1343
5
Z
GLM 5.1Zhipu AI
1341
6
Anthropic
Opus 4.7Anthropic
1338
7
Z
GLM 5 TurboZhipu AI
1336
8
Anthropic
Claude Sonnet 4.6Anthropic
1331
9
OpenAI
GPT-5.5OpenAI
1314
10
DeepSeek-AI
DeepSeek-V4-ProDeepSeek-AI
1313
Source: DesignArena
Benchmark
SWE-bench VerifiedLiveCodeBenchSWE-Bench Pro - PublicSWE-bench Multilingual
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff:

LLM Performance Results

Data source: DataLearnerAI
RankModelLicense
DeepSeek-AI
DeepSeek-V4-Pro
Thinking Level · Extra HighTools
DeepSeek-AI
80.60—55.4076.20Free commercial
MiniMaxAI
MiniMax M2.5
Thinking EnabledTools
MiniMaxAI
80.20—55.40—Free commercial
Moonshot AI
Kimi K2.6
Thinking EnabledTools
Moonshot AI
80.20—58.6076.70Free commercial
4
DeepSeek-AI
DeepSeek-V4-Pro
Thinking Level · HighTools
DeepSeek-AI
79.40—54.4074.10Free commercial
5
DeepSeek-AI
DeepSeek-V4-Flash
Thinking Level · Extra HighTools
DeepSeek-AI
79.00—52.6073.30Free commercial
6
DeepSeek-AI
DeepSeek-V4-Flash
Thinking Level · HighTools
DeepSeek-AI
78.60—52.3070.20Free commercial
7
智谱AI
GLM-5
Thinking Enabled
智谱AI
77.80———Free commercial
8
阿里巴巴
Qwen3.6-27B
Thinking EnabledTools
阿里巴巴
77.20—53.5071.30Free commercial
9
Moonshot AI
Kimi K2.5
Thinking EnabledTools
Moonshot AI
76.80—50.70—Free commercial
10
阿里巴巴
Qwen3.5-397B-A17B
Thinking EnabledTools
阿里巴巴
76.40———Free commercial
11
MiniMaxAI
M2.1
Thinking Enabled
MiniMaxAI
74.80———Free commercial
12
StepFunAI
Step 3.5 Flash
Thinking Enabled
StepFunAI
74.4086.40——Free commercial
13
智谱AI
GLM-4.7
Thinking EnabledTools
智谱AI
73.80—40.60—Free commercial
14
DeepSeek-AI
DeepSeek-V4-Flash
Standard ModeTools
DeepSeek-AI
73.70—49.1069.70Free commercial
15
DeepSeek-AI
DeepSeek-V4-Pro
Standard ModeTools
DeepSeek-AI
73.60—52.1069.80Free commercial
16
阿里巴巴
Qwen3.6-35B-A3B
Thinking Enabled
阿里巴巴
73.4080.4049.5067.20Free commercial
17
DeepSeek-AI
DeepSeek V3.2
Thinking EnabledTools
DeepSeek-AI
73.10———Free commercial
18
阿里巴巴
Qwen3.5-27B
Thinking Enabled
阿里巴巴
72.40———Free commercial
19
Moonshot AI
Kimi K2 Thinking
Thinking EnabledTools
Moonshot AI
71.30———Free commercial
20
阿里巴巴
Qwen3-Coder-Next
Standard ModeTools
阿里巴巴
70.60—44.30—Free commercial
21
DeepSeek-AI
DeepSeek V3.2
Thinking Enabled
DeepSeek-AI
70.2083.3040.90—Free commercial
22
MiniMaxAI
MiniMax M2
Thinking EnabledTools
MiniMaxAI
69.40———Free commercial
23
Moonshot AI
Kimi K2 0905
Moonshot AI
69.20—27.67—Free commercial
24
Moonshot AI
Kimi K2 0905
Thinking EnabledTools
Moonshot AI
69.20———Free commercial
25
DeepSeek-AI
DeepSeek-V3.1 Terminus
DeepSeek-AI
68.4074.90——Free commercial
26
智谱AI
GLM-4.6
智谱AI
68.0056.00——Free commercial
27
智谱AI
GLM-4.6
Thinking EnabledTools
智谱AI
68.0084.50——Free commercial
28
DeepSeek-AI
DeepSeek V3.2-Exp
Thinking EnabledTools
DeepSeek-AI
67.80———Free commercial
29
阿里巴巴
Qwen3-Coder-480B-A35B
阿里巴巴
67.00———Free commercial
30
DeepSeek-AI
DeepSeek-V3.1
DeepSeek-AI
66.0056.40——Free commercial
31
智谱AI
GLM-4.5
Thinking Enabled
智谱AI
64.2072.90——Free commercial
32
OpenAI
GPT OSS 120B
Thinking Enabled
OpenAI
60.10———Free commercial
33
智谱AI
GLM-4.7-Flash
Thinking Enabled
智谱AI
59.20———Free commercial
34
DeepSeek-AI
DeepSeek-R1-0528
Thinking Enabled
DeepSeek-AI
57.6073.30——Free commercial
35
智谱AI
GLM-4.5-Air
Thinking Enabled
智谱AI
57.6070.70——Free commercial
36
MiniMaxAI
MiniMax-M1-80k
MiniMaxAI
56.0065.00——Free commercial
37
MiniMaxAI
MiniMax-M1-40k
MiniMaxAI
55.6062.30——Free commercial
38
MistralAI
Devstral Small 1.1
MistralAI
53.60———Free commercial
39
Moonshot AI
Kimi K2
Moonshot AI
51.8053.70——Free commercial
40
阿里巴巴
Qwen3-Coder-Flash
阿里巴巴
51.60———Free commercial
41
DeepSeek-AI
DeepSeek-R1
DeepSeek-AI
49.2065.90——Free commercial
42
MistralAI
Devstral Small 1.0
MistralAI
46.80———Free commercial
43
DeepSeek-AI
DeepSeek-V3-0324
DeepSeek-AI
38.8049.20——Free commercial
44
阿里巴巴
Qwen3-235B-A22B
阿里巴巴
34.4070.70——Free commercial
45
OpenAI
GPT OSS 20B
Thinking Enabled
OpenAI
34.00———Free commercial
46
阿里巴巴
Qwen3-30B-A3B-2507
Thinking Enabled
阿里巴巴
22.00———Free commercial
47
DeepSeek-AI
DeepSeek-V3
DeepSeek-AI
—34.60——Free commercial
48
Tencent ARC
Hunyuan-7B
Tencent ARC
—57.00——Free commercial
49
阿里巴巴
Qwen3-4B-2507
阿里巴巴
—35.10——Free commercial
50
百度
ERNIE-4.5-300B-A47B
百度
—38.80——Free commercial
DeepSeek-V4-Pro
DeepSeek-AI
Thinking Level · Extra HighTools
SWE-bench Verified80.60
LiveCodeBench—
SWE-Bench Pro - Public55.40
SWE-bench Multilingual76.20
Free commercial
MiniMax M2.5
MiniMaxAI
Thinking EnabledTools
SWE-bench Verified80.20
LiveCodeBench—
SWE-Bench Pro - Public55.40
SWE-bench Multilingual—
Free commercial
Kimi K2.6
Moonshot AI
Thinking EnabledTools
SWE-bench Verified80.20
LiveCodeBench—
SWE-Bench Pro - Public58.60
SWE-bench Multilingual76.70
Free commercial
4
DeepSeek-V4-Pro
DeepSeek-AI
Thinking Level · HighTools
SWE-bench Verified79.40
LiveCodeBench—
SWE-Bench Pro - Public54.40
SWE-bench Multilingual74.10
Free commercial
5
DeepSeek-V4-Flash
DeepSeek-AI
Thinking Level · Extra HighTools
SWE-bench Verified79.00
LiveCodeBench—
SWE-Bench Pro - Public52.60
SWE-bench Multilingual73.30
Free commercial
6
DeepSeek-V4-Flash
DeepSeek-AI
Thinking Level · HighTools
SWE-bench Verified78.60
LiveCodeBench—
SWE-Bench Pro - Public52.30
SWE-bench Multilingual70.20
Free commercial
7
GLM-5
智谱AI
Thinking Enabled
SWE-bench Verified77.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
8
Qwen3.6-27B
阿里巴巴
Thinking EnabledTools
SWE-bench Verified77.20
LiveCodeBench—
SWE-Bench Pro - Public53.50
SWE-bench Multilingual71.30
Free commercial
9
Kimi K2.5
Moonshot AI
Thinking EnabledTools
SWE-bench Verified76.80
LiveCodeBench—
SWE-Bench Pro - Public50.70
SWE-bench Multilingual—
Free commercial
10
Qwen3.5-397B-A17B
阿里巴巴
Thinking EnabledTools
SWE-bench Verified76.40
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
11
M2.1
MiniMaxAI
Thinking Enabled
SWE-bench Verified74.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
12
Step 3.5 Flash
StepFunAI
Thinking Enabled
SWE-bench Verified74.40
LiveCodeBench86.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
13
GLM-4.7
智谱AI
Thinking EnabledTools
SWE-bench Verified73.80
LiveCodeBench—
SWE-Bench Pro - Public40.60
SWE-bench Multilingual—
Free commercial
14
DeepSeek-V4-Flash
DeepSeek-AI
Standard ModeTools
SWE-bench Verified73.70
LiveCodeBench—
SWE-Bench Pro - Public49.10
SWE-bench Multilingual69.70
Free commercial
15
DeepSeek-V4-Pro
DeepSeek-AI
Standard ModeTools
SWE-bench Verified73.60
LiveCodeBench—
SWE-Bench Pro - Public52.10
SWE-bench Multilingual69.80
Free commercial
16
Qwen3.6-35B-A3B
阿里巴巴
Thinking Enabled
SWE-bench Verified73.40
LiveCodeBench80.40
SWE-Bench Pro - Public49.50
SWE-bench Multilingual67.20
Free commercial
17
DeepSeek V3.2
DeepSeek-AI
Thinking EnabledTools
SWE-bench Verified73.10
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
18
Qwen3.5-27B
阿里巴巴
Thinking Enabled
SWE-bench Verified72.40
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
19
Kimi K2 Thinking
Moonshot AI
Thinking EnabledTools
SWE-bench Verified71.30
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
20
Qwen3-Coder-Next
阿里巴巴
Standard ModeTools
SWE-bench Verified70.60
LiveCodeBench—
SWE-Bench Pro - Public44.30
SWE-bench Multilingual—
Free commercial
21
DeepSeek V3.2
DeepSeek-AI
Thinking Enabled
SWE-bench Verified70.20
LiveCodeBench83.30
SWE-Bench Pro - Public40.90
SWE-bench Multilingual—
Free commercial
22
MiniMax M2
MiniMaxAI
Thinking EnabledTools
SWE-bench Verified69.40
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
23
Kimi K2 0905
Moonshot AI
SWE-bench Verified69.20
LiveCodeBench—
SWE-Bench Pro - Public27.67
SWE-bench Multilingual—
Free commercial
24
Kimi K2 0905
Moonshot AI
Thinking EnabledTools
SWE-bench Verified69.20
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
25
DeepSeek-V3.1 Terminus
DeepSeek-AI
SWE-bench Verified68.40
LiveCodeBench74.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
26
GLM-4.6
智谱AI
SWE-bench Verified68.00
LiveCodeBench56.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
27
GLM-4.6
智谱AI
Thinking EnabledTools
SWE-bench Verified68.00
LiveCodeBench84.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
28
DeepSeek V3.2-Exp
DeepSeek-AI
Thinking EnabledTools
SWE-bench Verified67.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
29
Qwen3-Coder-480B-A35B
阿里巴巴
SWE-bench Verified67.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
30
DeepSeek-V3.1
DeepSeek-AI
SWE-bench Verified66.00
LiveCodeBench56.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
31
GLM-4.5
智谱AI
Thinking Enabled
SWE-bench Verified64.20
LiveCodeBench72.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
32
GPT OSS 120B
OpenAI
Thinking Enabled
SWE-bench Verified60.10
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
33
GLM-4.7-Flash
智谱AI
Thinking Enabled
SWE-bench Verified59.20
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
34
DeepSeek-R1-0528
DeepSeek-AI
Thinking Enabled
SWE-bench Verified57.60
LiveCodeBench73.30
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
35
GLM-4.5-Air
智谱AI
Thinking Enabled
SWE-bench Verified57.60
LiveCodeBench70.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
36
MiniMax-M1-80k
MiniMaxAI
SWE-bench Verified56.00
LiveCodeBench65.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
37
MiniMax-M1-40k
MiniMaxAI
SWE-bench Verified55.60
LiveCodeBench62.30
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
38
Devstral Small 1.1
MistralAI
SWE-bench Verified53.60
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
39
Kimi K2
Moonshot AI
SWE-bench Verified51.80
LiveCodeBench53.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
40
Qwen3-Coder-Flash
阿里巴巴
SWE-bench Verified51.60
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
41
DeepSeek-R1
DeepSeek-AI
SWE-bench Verified49.20
LiveCodeBench65.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
42
Devstral Small 1.0
MistralAI
SWE-bench Verified46.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
43
DeepSeek-V3-0324
DeepSeek-AI
SWE-bench Verified38.80
LiveCodeBench49.20
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
44
Qwen3-235B-A22B
阿里巴巴
SWE-bench Verified34.40
LiveCodeBench70.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
45
GPT OSS 20B
OpenAI
Thinking Enabled
SWE-bench Verified34.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
46
Qwen3-30B-A3B-2507
阿里巴巴
Thinking Enabled
SWE-bench Verified22.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
47
DeepSeek-V3
DeepSeek-AI
SWE-bench Verified—
LiveCodeBench34.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
48
Hunyuan-7B
Tencent ARC
SWE-bench Verified—
LiveCodeBench57.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
49
Qwen3-4B-2507
阿里巴巴
SWE-bench Verified—
LiveCodeBench35.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
50
ERNIE-4.5-300B-A47B
百度
SWE-bench Verified—
LiveCodeBench38.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
Sort by:
Showing 50 of 104 modelsView SWE-bench Verified benchmark page