DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Coding Benchmark Leaderboard

This page provides the LLM coding benchmark leaderboard, covering SWE-Bench Verified, SWE-Bench Pro, LiveCodeBench, and SWE-bench Multilingual datasets, comparing GPT, Claude, Qwen, and DeepSeek models.

Updated on 2026-05-02 07:10:24

As of 2026-05, this page covers SWE-bench Verified, LiveCodeBench, SWE-Bench Pro - Public, SWE-bench Multilingual and related benchmarks for LLM Coding Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Reference: Composite Coding Rankings

There is no single, universally accepted coding leaderboard. Static benchmarks like SWE-bench and HumanEval measure specific skills but can be gamed through targeted fine-tuning. We selected two complementary human-preference leaderboards: LMArena Coding Arena ranks models on general programming tasks (debugging, algorithms, code generation) via anonymous crowd-sourced voting; DesignArena Code Category focuses specifically on visual, front-end code generation (websites, UI components, games) using the same blind-voting methodology. Reading both together gives a fuller picture of coding capability.

LMArena Coding Arena

Full ranking

Elo ratings from anonymous A/B voting on real general coding tasks (debugging, algorithms, code generation) submitted by developers.

Updated 2026-05-07

#ModelElo
1
Anthropic
Opus 4.7 (thinking)Anthropic
1569
2
Anthropic
Claude Opus 4.6 (thinking)Anthropic
1553
3
Anthropic
Opus 4.7Anthropic
1550
4
Anthropic
Claude Opus 4.6Anthropic
1550
5
Anthropic
Claude Opus 4 (thinking-32k)Anthropic
1531
6
F
Muse SparkFacebook AI研究实验室
1530
7
Google Deep Mind
Gemini 3.1 Pro PreviewGoogle Deep Mind
1529
8
OpenAI
gpt-5.4-highOpenAI
1528
9
智
GLM 5.1智谱AI
1525
10
OpenAI
gpt-5.5-highOpenAI
1524
Benchmark
SWE-bench VerifiedLiveCodeBenchSWE-Bench Pro - PublicSWE-bench Multilingual
More Benchmarks
Model Size:All3B and below7B13B

LLM Performance Results

Data source: DataLearnerAI
RankModelLicense
Anthropic
Opus 4.7
Anthropic
87.60—64.30—Proprietary
Anthropic
Opus 4.5
Anthropic
80.9087.00——Proprietary
Anthropic
Claude Opus 4.6
Anthropic
80.8476.00—72.00Proprietary
4
DeepSeek-AI
DeepSeek-V4-Pro
DeepSeek-AI
80.6093.5055.4076.20Free commercial
5
Anthropic
Claude Sonnet 4
Anthropic
80.2066.0042.70—Proprietary
6
MiniMaxAI
MiniMax M2.5
MiniMaxAI
80.20—55.40—Free commercial
7
Moonshot AI
Kimi K2.6
Moonshot AI
80.2089.6058.6076.70Free commercial
8
DeepSeek-AI
DeepSeek-V4-Flash
DeepSeek-AI
79.0091.6052.6073.30Free commercial
9
Facebook AI研究实验室
Muse Spark
Facebook AI研究实验室
77.40———Proprietary
10
阿里巴巴
Qwen3.6-27B
阿里巴巴
77.2083.9053.5071.30Free commercial
11
OpenAI
GPT-5.1
OpenAI
76.30—50.80—Proprietary
12
阿里巴巴
Qwen3-Max-Thinking
阿里巴巴
75.3085.90——Proprietary
13
OpenAI
o3-pro
OpenAI
75.00———Proprietary
14
Anthropic
Opus 4.1
Anthropic
74.50———Proprietary
15
阿里巴巴
Qwen3.6-35B-A3B
阿里巴巴
73.4080.4049.5067.20Free commercial
16
DeepSeek-AI
DeepSeek V3.2
DeepSeek-AI
73.1083.3040.90—Free commercial
17
Anthropic
Claude Opus 4
Anthropic
72.5056.60——Proprietary
18
阿里巴巴
Qwen3.5-27B
阿里巴巴
72.4080.70——Free commercial
19
Moonshot AI
Kimi K2 Thinking
Moonshot AI
71.3083.10——Free commercial
20
OpenAI
OpenAI o3
OpenAI
69.1075.80——Proprietary
21
OpenAI
OpenAI o4 - mini
OpenAI
68.10———Proprietary
22
DeepSeek-AI
DeepSeek V3.2-Exp
DeepSeek-AI
67.8074.10——Free commercial
23
Google Deep Mind
Gemini 2.5-Pro
Google Deep Mind
67.2077.10——Proprietary
24
智谱AI
GLM-4.5
智谱AI
64.2072.90——Free commercial
25
Google Deep Mind
Gemini 2.5 Pro Experimental 03-25
Google Deep Mind
63.8070.40——Proprietary
26
Google Deep Mind
Gemini-2.5-Pro-Preview-05-06
Google Deep Mind
63.2077.10——Proprietary
27
OpenAI
GPT OSS 120B
OpenAI
60.10———Free commercial
28
智谱AI
GLM-4.7-Flash
智谱AI
59.20———Free commercial
29
xAI
Grok 4
xAI
58.6082.00——Proprietary
30
DeepSeek-AI
DeepSeek-R1-0528
DeepSeek-AI
57.6073.30——Free commercial
31
智谱AI
GLM-4.5-Air
智谱AI
57.6070.70——Free commercial
32
MiniMaxAI
MiniMax-M1-80k
MiniMaxAI
56.0065.00——Free commercial
33
MiniMaxAI
MiniMax-M1-40k
MiniMaxAI
55.6062.30——Free commercial
34
xAI
Grok 4.1
xAI
54.60———Proprietary
35
Google Deep Mind
Gemini 2.5 Flash
Google Deep Mind
50.0055.40——Proprietary
36
OpenAI
OpenAI o3-mini (high)
OpenAI
49.3069.50——Proprietary
37
DeepSeek-AI
DeepSeek-R1
DeepSeek-AI
49.2065.90——Free commercial
38
OpenAI
OpenAI o1
OpenAI
48.9071.00——Proprietary
39
OpenAI
OpenAI o3-mini
OpenAI
40.80———Proprietary
40
阿里巴巴
Qwen3-235B-A22B
阿里巴巴
34.4070.70——Free commercial
41
OpenAI
GPT OSS 20B
OpenAI
34.00———Free commercial
42
Google Deep Mind
Gemini 2.5 Flash-Lite
Google Deep Mind
27.6034.30——Proprietary
43
阿里巴巴
Qwen3-8B
阿里巴巴
—61.80——Free commercial
44
Cursor
Composer 1.5
Cursor
———65.90Proprietary
45
MistralAI
Magistral-Medium-2506
MistralAI
—59.36——Proprietary
46
MistralAI
Magistral-Small-2506
MistralAI
—55.84——Free commercial
47
OpenAI
OpenAI o1-mini
OpenAI
—52.00——Proprietary
48
腾讯AI实验室
Hunyuan-TurboS
腾讯AI实验室
—32.00——Proprietary
49
阿里巴巴
Qwen3-30B-A3B
阿里巴巴
—29.00——Free commercial
50
OpenAI
GPT-5.5
OpenAI
——58.60—Proprietary
Opus 4.7
Anthropic
SWE-bench Verified87.60
LiveCodeBench—
SWE-Bench Pro - Public64.30
SWE-bench Multilingual—
Proprietary
Opus 4.5
Anthropic
SWE-bench Verified80.90
LiveCodeBench87.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
Claude Opus 4.6
Anthropic
SWE-bench Verified80.84
LiveCodeBench76.00
SWE-Bench Pro - Public—
SWE-bench Multilingual72.00
Proprietary
4
DeepSeek-V4-Pro
DeepSeek-AI
SWE-bench Verified80.60
LiveCodeBench93.50
SWE-Bench Pro - Public55.40
SWE-bench Multilingual76.20
Free commercial
5
Claude Sonnet 4
Anthropic
SWE-bench Verified80.20
LiveCodeBench66.00
SWE-Bench Pro - Public42.70
SWE-bench Multilingual—
Proprietary
6
MiniMax M2.5
MiniMaxAI
SWE-bench Verified80.20
LiveCodeBench—
SWE-Bench Pro - Public55.40
SWE-bench Multilingual—
Free commercial
7
Kimi K2.6
Moonshot AI
SWE-bench Verified80.20
LiveCodeBench89.60
SWE-Bench Pro - Public58.60
SWE-bench Multilingual76.70
Free commercial
8
DeepSeek-V4-Flash
DeepSeek-AI
SWE-bench Verified79.00
LiveCodeBench91.60
SWE-Bench Pro - Public52.60
SWE-bench Multilingual73.30
Free commercial
9
Muse Spark
Facebook AI研究实验室
SWE-bench Verified77.40
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
10
Qwen3.6-27B
阿里巴巴
SWE-bench Verified77.20
LiveCodeBench83.90
SWE-Bench Pro - Public53.50
SWE-bench Multilingual71.30
Free commercial
11
GPT-5.1
OpenAI
SWE-bench Verified76.30
LiveCodeBench—
SWE-Bench Pro - Public50.80
SWE-bench Multilingual—
Proprietary
12
Qwen3-Max-Thinking
阿里巴巴
SWE-bench Verified75.30
LiveCodeBench85.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
13
o3-pro
OpenAI
SWE-bench Verified75.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
14
Opus 4.1
Anthropic
SWE-bench Verified74.50
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
15
Qwen3.6-35B-A3B
阿里巴巴
SWE-bench Verified73.40
LiveCodeBench80.40
SWE-Bench Pro - Public49.50
SWE-bench Multilingual67.20
Free commercial
16
DeepSeek V3.2
DeepSeek-AI
SWE-bench Verified73.10
LiveCodeBench83.30
SWE-Bench Pro - Public40.90
SWE-bench Multilingual—
Free commercial
17
Claude Opus 4
Anthropic
SWE-bench Verified72.50
LiveCodeBench56.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
18
Qwen3.5-27B
阿里巴巴
SWE-bench Verified72.40
LiveCodeBench80.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
19
Kimi K2 Thinking
Moonshot AI
SWE-bench Verified71.30
LiveCodeBench83.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
20
OpenAI o3
OpenAI
SWE-bench Verified69.10
LiveCodeBench75.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
21
OpenAI o4 - mini
OpenAI
SWE-bench Verified68.10
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
22
DeepSeek V3.2-Exp
DeepSeek-AI
SWE-bench Verified67.80
LiveCodeBench74.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
23
Gemini 2.5-Pro
Google Deep Mind
SWE-bench Verified67.20
LiveCodeBench77.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
24
GLM-4.5
智谱AI
SWE-bench Verified64.20
LiveCodeBench72.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
25
Gemini 2.5 Pro Experimental 03-25
Google Deep Mind
SWE-bench Verified63.80
LiveCodeBench70.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
26
Gemini-2.5-Pro-Preview-05-06
Google Deep Mind
SWE-bench Verified63.20
LiveCodeBench77.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
27
GPT OSS 120B
OpenAI
SWE-bench Verified60.10
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
28
GLM-4.7-Flash
智谱AI
SWE-bench Verified59.20
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
29
Grok 4
xAI
SWE-bench Verified58.60
LiveCodeBench82.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
30
DeepSeek-R1-0528
DeepSeek-AI
SWE-bench Verified57.60
LiveCodeBench73.30
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
31
GLM-4.5-Air
智谱AI
SWE-bench Verified57.60
LiveCodeBench70.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
32
MiniMax-M1-80k
MiniMaxAI
SWE-bench Verified56.00
LiveCodeBench65.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
33
MiniMax-M1-40k
MiniMaxAI
SWE-bench Verified55.60
LiveCodeBench62.30
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
34
Grok 4.1
xAI
SWE-bench Verified54.60
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
35
Gemini 2.5 Flash
Google Deep Mind
SWE-bench Verified50.00
LiveCodeBench55.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
36
OpenAI o3-mini (high)
OpenAI
SWE-bench Verified49.30
LiveCodeBench69.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
37
DeepSeek-R1
DeepSeek-AI
SWE-bench Verified49.20
LiveCodeBench65.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
38
OpenAI o1
OpenAI
SWE-bench Verified48.90
LiveCodeBench71.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
39
OpenAI o3-mini
OpenAI
SWE-bench Verified40.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
40
Qwen3-235B-A22B
阿里巴巴
SWE-bench Verified34.40
LiveCodeBench70.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
41
GPT OSS 20B
OpenAI
SWE-bench Verified34.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
42
Gemini 2.5 Flash-Lite
Google Deep Mind
SWE-bench Verified27.60
LiveCodeBench34.30
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
43
Qwen3-8B
阿里巴巴
SWE-bench Verified—
LiveCodeBench61.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
44
Composer 1.5
Cursor
SWE-bench Verified—
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual65.90
Proprietary
45
Magistral-Medium-2506
MistralAI
SWE-bench Verified—
LiveCodeBench59.36
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
46
Magistral-Small-2506
MistralAI
SWE-bench Verified—
LiveCodeBench55.84
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
47
OpenAI o1-mini
OpenAI
SWE-bench Verified—
LiveCodeBench52.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
48
Hunyuan-TurboS
腾讯AI实验室
SWE-bench Verified—
LiveCodeBench32.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
49
Qwen3-30B-A3B
阿里巴巴
SWE-bench Verified—
LiveCodeBench29.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
50
GPT-5.5
OpenAI
SWE-bench Verified—
LiveCodeBench—
SWE-Bench Pro - Public58.60
SWE-bench Multilingual—
Proprietary
Sort by:
Showing 50 of 64 modelsView SWE-bench Verified benchmark page
Source: LMArena

DesignArena Code Category

Full ranking

Elo ratings from anonymous voting on visual front-end code tasks (websites, UI components, games, data viz) by Arcada Labs.

Updated 2026-05-10

#ModelElo
1
Anthropic
Claude Opus 4.7 (Thinking)Anthropic
1350
2
Anthropic
Claude Opus 4.6Anthropic
1346
3
Anthropic
Claude Opus 4.6 (Thinking)Anthropic
1344
4
Moonshot AI
Kimi K2.6Moonshot AI
1343
5
Z
GLM 5.1Zhipu AI
1341
6
Anthropic
Opus 4.7Anthropic
1338
7
Z
GLM 5 TurboZhipu AI
1336
8
Anthropic
Claude Sonnet 4.6Anthropic
1331
9
OpenAI
GPT-5.5OpenAI
1314
10
DeepSeek-AI
DeepSeek-V4-ProDeepSeek-AI
1313
Source: DesignArena
34B
65B
100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff: