DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Coding Benchmark Leaderboard

This page provides the LLM coding benchmark leaderboard, covering SWE-Bench Verified, SWE-Bench Pro, LiveCodeBench, and SWE-bench Multilingual datasets, comparing GPT, Claude, Qwen, and DeepSeek models.

Updated on 2026-05-02 07:10:24

As of 2026-05, this page covers SWE-bench Verified, LiveCodeBench, SWE-Bench Pro - Public, SWE-bench Multilingual and related benchmarks for LLM Coding Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Reference: Composite Coding Rankings

There is no single, universally accepted coding leaderboard. Static benchmarks like SWE-bench and HumanEval measure specific skills but can be gamed through targeted fine-tuning. We selected two complementary human-preference leaderboards: LMArena Coding Arena ranks models on general programming tasks (debugging, algorithms, code generation) via anonymous crowd-sourced voting; DesignArena Code Category focuses specifically on visual, front-end code generation (websites, UI components, games) using the same blind-voting methodology. Reading both together gives a fuller picture of coding capability.

LMArena Coding Arena

Full ranking

Elo ratings from anonymous A/B voting on real general coding tasks (debugging, algorithms, code generation) submitted by developers.

Updated 2026-05-07

#ModelElo
1
Anthropic
Opus 4.7 (thinking)Anthropic
1569
2
Anthropic
Claude Opus 4.6 (thinking)Anthropic
1553
3
Anthropic
Opus 4.7Anthropic
1550
4
Anthropic
Claude Opus 4.6Anthropic
1550
5
Anthropic
Claude Opus 4 (thinking-32k)Anthropic
1531
6
F
Muse SparkFacebook AI研究实验室
1530
7
Google Deep Mind
Gemini 3.1 Pro PreviewGoogle Deep Mind
1529
8
OpenAI
gpt-5.4-highOpenAI
1528
9
智
GLM 5.1智谱AI
1525
10
OpenAI
gpt-5.5-highOpenAI
1524
Benchmark
SWE-bench VerifiedLiveCodeBenchSWE-Bench Pro - PublicSWE-bench Multilingual
More Benchmarks
Model Size:All3B and below7B13B

LLM Performance Results

Data source: DataLearnerAI
RankModelLicense
Anthropic
Claude Mythos Preview
Anthropic
93.90—77.8087.30Proprietary
Anthropic
Claude Sonnet 4.5
Anthropic
82.0071.0043.60—Proprietary
OpenAI
GPT-5.2
OpenAI
80.00—55.60—Proprietary
4
Anthropic
Claude Sonnet 4.6
Anthropic
79.60———Proprietary
5
阿里巴巴
Qwen 3.6 Plus Preview
阿里巴巴
78.8087.1056.6073.80Proprietary
6
智谱AI
GLM-5
智谱AI
77.80———Free commercial
7
MiniMaxAI
M2.1
MiniMaxAI
74.80—32.60—Free commercial
8
StepFunAI
Step 3.5 Flash
StepFunAI
74.4086.40——Free commercial
9
智谱AI
GLM-4.7
智谱AI
73.8084.9040.60—Free commercial
10
xAI
Grok 4 Heavy
xAI
73.50———Proprietary
11
Anthropic
Claude Sonnet 3.7
Anthropic
70.30———Proprietary
12
阿里巴巴
Qwen3 Max (Preview)
阿里巴巴
69.6057.50——Proprietary
13
MiniMaxAI
MiniMax M2
MiniMaxAI
69.4083.00——Free commercial
14
Moonshot AI
Kimi K2 0905
Moonshot AI
69.20—27.67—Free commercial
15
Google Deep Mind
Gemini 3.0 Flash
Google Deep Mind
68.70———Proprietary
16
DeepSeek-AI
DeepSeek-V3.1 Terminus
DeepSeek-AI
68.4080.00——Free commercial
17
智谱AI
GLM-4.6
智谱AI
68.0084.50——Free commercial
18
DeepSeek-AI
DeepSeek-V3.1
DeepSeek-AI
66.0074.80——Free commercial
19
OpenAI
GPT-4.1
OpenAI
54.6040.50——Proprietary
20
Google Deep Mind
Gemini 2.5 Flash-Preview-09-2025
Google Deep Mind
54.00———Proprietary
21
Moonshot AI
Kimi K2
Moonshot AI
51.8053.70——Free commercial
22
Anthropic
Claude 3.5 Sonnet New
Anthropic
49.0038.70——Proprietary
23
DeepSeek-AI
DeepSeek-V3-0324
DeepSeek-AI
38.8049.20——Free commercial
24
OpenAI
GPT-4.5
OpenAI
38.0046.40——Proprietary
25
OpenAI
GPT-4o(2024-11-20)
OpenAI
31.00———Proprietary
26
OpenAI
GPT-4.1 mini
OpenAI
23.60———Proprietary
27
阿里巴巴
Qwen3-30B-A3B-2507
阿里巴巴
22.0043.20——Free commercial
28
阿里巴巴
Qwen3.6-Max-Preview
阿里巴巴
——57.30—Proprietary
29
DeepMind
Gemini 2.0 Flash-Lite
DeepMind
—28.90——Proprietary
30
Google Deep Mind
Gemma 3 - 27B (IT)
Google Deep Mind
—29.70——Free commercial
31
Facebook AI研究实验室
Llama3.3-70B-Instruct
Facebook AI研究实验室
—33.30——Free commercial
32
DeepSeek-AI
DeepSeek-V3
DeepSeek-AI
—34.60——Free commercial
33
阿里巴巴
Qwen3-4B-2507
阿里巴巴
—35.10——Free commercial
34
OpenAI
GPT-4o(2025-03-27)
OpenAI
—35.80——Proprietary
35
百度
ERNIE-4.5-300B-A47B
百度
—38.80——Free commercial
36
阿里巴巴
Qwen3-235B-A22B-2507
阿里巴巴
—51.80——Free commercial
37
智谱AI
GLM-4-9B-Chat
智谱AI
—51.80——Free commercial
38
阿里巴巴
Qwen3-4B-Thinking-2507
阿里巴巴
—55.20——Free commercial
39
阿里巴巴
Qwen3-Next
阿里巴巴
—56.60——Free commercial
40
Tencent ARC
Hunyuan-7B
Tencent ARC
—57.00——Free commercial
41
华为
Pangu Pro MoE
华为
—59.60——Free commercial
42
华为
Pangu Embedded
华为
—67.10——Free commercial
43
xAI
Grok 3
xAI
—70.60——Proprietary
44
DeepMind
Gemma 4 26B A4B
DeepMind
—77.10——Free commercial
45
DeepMind
Gemma 4 31B
DeepMind
—80.00——Free commercial
46
xAI
Grok 4 Fast
xAI
—80.00——Proprietary
47
Google Deep Mind
Gemini 2.5 Deep Think
Google Deep Mind
—87.60——Proprietary
Claude Mythos Preview
Anthropic
SWE-bench Verified93.90
LiveCodeBench—
SWE-Bench Pro - Public77.80
SWE-bench Multilingual87.30
Proprietary
Claude Sonnet 4.5
Anthropic
SWE-bench Verified82.00
LiveCodeBench71.00
SWE-Bench Pro - Public43.60
SWE-bench Multilingual—
Proprietary
GPT-5.2
OpenAI
SWE-bench Verified80.00
LiveCodeBench—
SWE-Bench Pro - Public55.60
SWE-bench Multilingual—
Proprietary
4
Claude Sonnet 4.6
Anthropic
SWE-bench Verified79.60
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
5
Qwen 3.6 Plus Preview
阿里巴巴
SWE-bench Verified78.80
LiveCodeBench87.10
SWE-Bench Pro - Public56.60
SWE-bench Multilingual73.80
Proprietary
6
GLM-5
智谱AI
SWE-bench Verified77.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
7
M2.1
MiniMaxAI
SWE-bench Verified74.80
LiveCodeBench—
SWE-Bench Pro - Public32.60
SWE-bench Multilingual—
Free commercial
8
Step 3.5 Flash
StepFunAI
SWE-bench Verified74.40
LiveCodeBench86.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
9
GLM-4.7
智谱AI
SWE-bench Verified73.80
LiveCodeBench84.90
SWE-Bench Pro - Public40.60
SWE-bench Multilingual—
Free commercial
10
Grok 4 Heavy
xAI
SWE-bench Verified73.50
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
11
Claude Sonnet 3.7
Anthropic
SWE-bench Verified70.30
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
12
Qwen3 Max (Preview)
阿里巴巴
SWE-bench Verified69.60
LiveCodeBench57.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
13
MiniMax M2
MiniMaxAI
SWE-bench Verified69.40
LiveCodeBench83.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
14
Kimi K2 0905
Moonshot AI
SWE-bench Verified69.20
LiveCodeBench—
SWE-Bench Pro - Public27.67
SWE-bench Multilingual—
Free commercial
15
Gemini 3.0 Flash
Google Deep Mind
SWE-bench Verified68.70
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
16
DeepSeek-V3.1 Terminus
DeepSeek-AI
SWE-bench Verified68.40
LiveCodeBench80.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
17
GLM-4.6
智谱AI
SWE-bench Verified68.00
LiveCodeBench84.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
18
DeepSeek-V3.1
DeepSeek-AI
SWE-bench Verified66.00
LiveCodeBench74.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
19
GPT-4.1
OpenAI
SWE-bench Verified54.60
LiveCodeBench40.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
20
Gemini 2.5 Flash-Preview-09-2025
Google Deep Mind
SWE-bench Verified54.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
21
Kimi K2
Moonshot AI
SWE-bench Verified51.80
LiveCodeBench53.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
22
Claude 3.5 Sonnet New
Anthropic
SWE-bench Verified49.00
LiveCodeBench38.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
23
DeepSeek-V3-0324
DeepSeek-AI
SWE-bench Verified38.80
LiveCodeBench49.20
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
24
GPT-4.5
OpenAI
SWE-bench Verified38.00
LiveCodeBench46.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
25
GPT-4o(2024-11-20)
OpenAI
SWE-bench Verified31.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
26
GPT-4.1 mini
OpenAI
SWE-bench Verified23.60
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
27
Qwen3-30B-A3B-2507
阿里巴巴
SWE-bench Verified22.00
LiveCodeBench43.20
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
28
Qwen3.6-Max-Preview
阿里巴巴
SWE-bench Verified—
LiveCodeBench—
SWE-Bench Pro - Public57.30
SWE-bench Multilingual—
Proprietary
29
Gemini 2.0 Flash-Lite
DeepMind
SWE-bench Verified—
LiveCodeBench28.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
30
Gemma 3 - 27B (IT)
Google Deep Mind
SWE-bench Verified—
LiveCodeBench29.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
31
Llama3.3-70B-Instruct
Facebook AI研究实验室
SWE-bench Verified—
LiveCodeBench33.30
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
32
DeepSeek-V3
DeepSeek-AI
SWE-bench Verified—
LiveCodeBench34.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
33
Qwen3-4B-2507
阿里巴巴
SWE-bench Verified—
LiveCodeBench35.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
34
GPT-4o(2025-03-27)
OpenAI
SWE-bench Verified—
LiveCodeBench35.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
35
ERNIE-4.5-300B-A47B
百度
SWE-bench Verified—
LiveCodeBench38.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
36
Qwen3-235B-A22B-2507
阿里巴巴
SWE-bench Verified—
LiveCodeBench51.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
37
GLM-4-9B-Chat
智谱AI
SWE-bench Verified—
LiveCodeBench51.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
38
Qwen3-4B-Thinking-2507
阿里巴巴
SWE-bench Verified—
LiveCodeBench55.20
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
39
Qwen3-Next
阿里巴巴
SWE-bench Verified—
LiveCodeBench56.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
40
Hunyuan-7B
Tencent ARC
SWE-bench Verified—
LiveCodeBench57.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
41
Pangu Pro MoE
华为
SWE-bench Verified—
LiveCodeBench59.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
42
Pangu Embedded
华为
SWE-bench Verified—
LiveCodeBench67.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
43
Grok 3
xAI
SWE-bench Verified—
LiveCodeBench70.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
44
Gemma 4 26B A4B
DeepMind
SWE-bench Verified—
LiveCodeBench77.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
45
Gemma 4 31B
DeepMind
SWE-bench Verified—
LiveCodeBench80.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
46
Grok 4 Fast
xAI
SWE-bench Verified—
LiveCodeBench80.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
47
Gemini 2.5 Deep Think
Google Deep Mind
SWE-bench Verified—
LiveCodeBench87.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
Sort by:
Source: LMArena

DesignArena Code Category

Full ranking

Elo ratings from anonymous voting on visual front-end code tasks (websites, UI components, games, data viz) by Arcada Labs.

Updated 2026-05-10

#ModelElo
1
Anthropic
Claude Opus 4.7 (Thinking)Anthropic
1350
2
Anthropic
Claude Opus 4.6Anthropic
1346
3
Anthropic
Claude Opus 4.6 (Thinking)Anthropic
1344
4
Moonshot AI
Kimi K2.6Moonshot AI
1343
5
Z
GLM 5.1Zhipu AI
1341
6
Anthropic
Opus 4.7Anthropic
1338
7
Z
GLM 5 TurboZhipu AI
1336
8
Anthropic
Claude Sonnet 4.6Anthropic
1331
9
OpenAI
GPT-5.5OpenAI
1314
10
DeepSeek-AI
DeepSeek-V4-ProDeepSeek-AI
1313
Source: DesignArena
34B
65B
100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff: