DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Coding Benchmark Leaderboard

This page provides the LLM coding benchmark leaderboard, covering SWE-Bench Verified, SWE-Bench Pro, LiveCodeBench, and SWE-bench Multilingual datasets, comparing GPT, Claude, Qwen, and DeepSeek models.

Updated on 2026-05-02 07:10:24

As of 2026-05, this page covers SWE-bench Verified, LiveCodeBench, SWE-Bench Pro - Public, SWE-bench Multilingual and related benchmarks for LLM Coding Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Reference: Composite Coding Rankings

There is no single, universally accepted coding leaderboard. Static benchmarks like SWE-bench and HumanEval measure specific skills but can be gamed through targeted fine-tuning. We selected two complementary human-preference leaderboards: LMArena Coding Arena ranks models on general programming tasks (debugging, algorithms, code generation) via anonymous crowd-sourced voting; DesignArena Code Category focuses specifically on visual, front-end code generation (websites, UI components, games) using the same blind-voting methodology. Reading both together gives a fuller picture of coding capability.

LMArena Coding Arena

Full ranking

Elo ratings from anonymous A/B voting on real general coding tasks (debugging, algorithms, code generation) submitted by developers.

Updated 2026-05-07

#ModelElo
1
Anthropic
Opus 4.7 (thinking)Anthropic
1569
2
Anthropic
Claude Opus 4.6 (thinking)Anthropic
1553
3
Anthropic
Opus 4.7Anthropic
1550
4
Anthropic
Claude Opus 4.6Anthropic
1550
5
Anthropic
Claude Opus 4 (thinking-32k)Anthropic
1531
6
F
Muse SparkFacebook AI研究实验室
1530
7
Google Deep Mind
Gemini 3.1 Pro PreviewGoogle Deep Mind
1529
8
OpenAI
gpt-5.4-highOpenAI
1528
9
智
GLM 5.1智谱AI
1525
10
OpenAI
gpt-5.5-highOpenAI
1524
Source: LMArena

DesignArena Code Category

Full ranking

Elo ratings from anonymous voting on visual front-end code tasks (websites, UI components, games, data viz) by Arcada Labs.

Updated 2026-05-10

#ModelElo
1
Anthropic
Claude Opus 4.7 (Thinking)Anthropic
1350
2
Anthropic
Claude Opus 4.6Anthropic
1346
3
Anthropic
Claude Opus 4.6 (Thinking)Anthropic
1344
4
Moonshot AI
Kimi K2.6Moonshot AI
1343
5
Z
GLM 5.1Zhipu AI
1341
6
Anthropic
Opus 4.7Anthropic
1338
7
Z
GLM 5 TurboZhipu AI
1336
8
Anthropic
Claude Sonnet 4.6Anthropic
1331
9
OpenAI
GPT-5.5OpenAI
1314
10
DeepSeek-AI
DeepSeek-V4-ProDeepSeek-AI
1313
Source: DesignArena
Benchmark
SWE-bench VerifiedLiveCodeBenchSWE-Bench Pro - PublicSWE-bench Multilingual
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff:

LLM Performance Results

Data source: DataLearnerAI
RankModelLicense
Anthropic
Claude Mythos Preview
Extended ThinkingTools
Anthropic
93.90—77.8087.30Proprietary
Anthropic
Opus 4.7
Extended ThinkingTools
Anthropic
87.60—64.30—Proprietary
Anthropic
Claude Sonnet 5
Parallel · Thinking Enabled
Anthropic
82.00———Proprietary
4
Anthropic
Claude Sonnet 4.5
Parallel · Thinking EnabledTools
Anthropic
82.00———Proprietary
5
Anthropic
Opus 4.5
Extended ThinkingTools
Anthropic
80.9087.00——Proprietary
6
Anthropic
Claude Opus 4.6
Extended ThinkingTools
Anthropic
80.84——72.00Proprietary
7
Google Deep Mind
Gemini 3.1 Pro Preview
Thinking Level · HighTools
Google Deep Mind
80.6091.7054.20—Proprietary
8
Anthropic
Claude Sonnet 4
Parallel · Thinking EnabledTools
Anthropic
80.20———Proprietary
9
OpenAI
GPT-5.2
Thinking Level · Extra HighTools
OpenAI
80.00—55.60—Proprietary
10
Anthropic
Claude Sonnet 4.6
Thinking Enabled
Anthropic
79.60———Proprietary
11
阿里巴巴
Qwen 3.6 Plus Preview
Thinking EnabledTools
阿里巴巴
78.80—56.60—Proprietary
12
Facebook AI研究实验室
Muse Spark
Thinking EnabledTools
Facebook AI研究实验室
77.40———Proprietary
13
Anthropic
Claude Sonnet 4.5
Thinking EnabledTools
Anthropic
77.20———Proprietary
14
OpenAI
GPT-5.1-Codex-Max
Thinking Level · HighTools
OpenAI
76.80———Proprietary
15
OpenAI
GPT-5.1
Thinking Level · High
OpenAI
76.30———Proprietary
16
OpenAI
GPT-5.1
Thinking Level · HighTools
OpenAI
76.30———Proprietary
17
Google Deep Mind
Gemini 3.0 Pro (Preview 11-2025)
Thinking Enabled
Google Deep Mind
76.2092.00——Proprietary
18
阿里巴巴
Qwen3-Max-Thinking
Thinking Enabled
阿里巴巴
75.3085.90——Proprietary
19
OpenAI
o3-pro
Thinking Level · High
OpenAI
75.00———Proprietary
20
Anthropic
Opus 4.1
Extended ThinkingTools
Anthropic
74.50———Proprietary
21
OpenAI
GPT-5 Codex
Thinking Level · High
OpenAI
74.50———Proprietary
22
xAI
Grok 4 Heavy
Parallel · Thinking EnabledTools
xAI
73.50———Proprietary
23
Anthropic
Haiku 4.5
Thinking EnabledTools
Anthropic
73.30———Proprietary
24
OpenAI
GPT-5
Thinking Level · High
OpenAI
72.80—36.30—Proprietary
25
Anthropic
Claude Sonnet 4
Thinking EnabledTools
Anthropic
72.70———Proprietary
26
Anthropic
Claude Opus 4
Anthropic
72.5056.60——Proprietary
27
xAI
Grok 4 Code
xAI
72.00———Proprietary
28
xAI
Grok Code Fast 1
Thinking Enabled
xAI
70.80———Proprietary
29
OpenAI
GPT-5.1 Codex
Thinking Level · HighTools
OpenAI
70.4085.50——Proprietary
30
Anthropic
Claude Sonnet 3.7
Thinking EnabledTools
Anthropic
70.30———Proprietary
31
阿里巴巴
Qwen3 Max (Preview)
阿里巴巴
69.6057.50——Proprietary
32
OpenAI
OpenAI o3
Thinking Enabled
OpenAI
69.10———Proprietary
33
Google Deep Mind
Gemini 3.0 Flash
Thinking Enabled
Google Deep Mind
68.70———Proprietary
34
OpenAI
OpenAI o4 - mini
Thinking Enabled
OpenAI
68.10———Proprietary
35
Google Deep Mind
Gemini 2.5-Pro
Thinking Enabled
Google Deep Mind
67.20———Proprietary
36
Google Deep Mind
Gemini 2.5 Pro Experimental 03-25
Google Deep Mind
63.8070.40——Proprietary
37
Google Deep Mind
Gemini-2.5-Pro-Preview-05-06
Google Deep Mind
63.2077.10——Proprietary
38
Anthropic
Claude Sonnet 3.7
Standard ModeTools
Anthropic
62.30———Proprietary
39
MistralAI
Devstral Medium
MistralAI
61.60———Proprietary
40
Anthropic
Haiku 4.5
Standard ModeTools
Anthropic
60.60———Proprietary
41
xAI
Grok 4
Thinking Enabled
xAI
58.6082.00——Proprietary
42
OpenAI
GPT-4.1
OpenAI
54.6040.50——Proprietary
43
xAI
Grok 4.1
xAI
54.60———Proprietary
44
Google Deep Mind
Gemini 2.5 Flash-Preview-09-2025
Thinking Enabled
Google Deep Mind
54.00———Proprietary
45
Google Deep Mind
Gemini 2.5 Flash
Google Deep Mind
50.0041.10——Proprietary
46
OpenAI
OpenAI o3-mini (high)
OpenAI
49.3069.50——Proprietary
47
Anthropic
Claude 3.5 Sonnet New
Anthropic
49.0038.70——Proprietary
48
OpenAI
OpenAI o1
OpenAI
48.9071.00——Proprietary
49
Google Deep Mind
Gemini 2.5 Flash
Thinking Enabled
Google Deep Mind
48.9055.40——Proprietary
50
OpenAI
OpenAI o1
Thinking Level · High
OpenAI
41.00———Proprietary
Claude Mythos Preview
Anthropic
Extended ThinkingTools
SWE-bench Verified93.90
LiveCodeBench—
SWE-Bench Pro - Public77.80
SWE-bench Multilingual87.30
Proprietary
Opus 4.7
Anthropic
Extended ThinkingTools
SWE-bench Verified87.60
LiveCodeBench—
SWE-Bench Pro - Public64.30
SWE-bench Multilingual—
Proprietary
Claude Sonnet 5
Anthropic
Parallel · Thinking Enabled
SWE-bench Verified82.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
4
Claude Sonnet 4.5
Anthropic
Parallel · Thinking EnabledTools
SWE-bench Verified82.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
5
Opus 4.5
Anthropic
Extended ThinkingTools
SWE-bench Verified80.90
LiveCodeBench87.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
6
Claude Opus 4.6
Anthropic
Extended ThinkingTools
SWE-bench Verified80.84
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual72.00
Proprietary
7
Gemini 3.1 Pro Preview
Google Deep Mind
Thinking Level · HighTools
SWE-bench Verified80.60
LiveCodeBench91.70
SWE-Bench Pro - Public54.20
SWE-bench Multilingual—
Proprietary
8
Claude Sonnet 4
Anthropic
Parallel · Thinking EnabledTools
SWE-bench Verified80.20
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
9
GPT-5.2
OpenAI
Thinking Level · Extra HighTools
SWE-bench Verified80.00
LiveCodeBench—
SWE-Bench Pro - Public55.60
SWE-bench Multilingual—
Proprietary
10
Claude Sonnet 4.6
Anthropic
Thinking Enabled
SWE-bench Verified79.60
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
11
Qwen 3.6 Plus Preview
阿里巴巴
Thinking EnabledTools
SWE-bench Verified78.80
LiveCodeBench—
SWE-Bench Pro - Public56.60
SWE-bench Multilingual—
Proprietary
12
Muse Spark
Facebook AI研究实验室
Thinking EnabledTools
SWE-bench Verified77.40
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
13
Claude Sonnet 4.5
Anthropic
Thinking EnabledTools
SWE-bench Verified77.20
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
14
GPT-5.1-Codex-Max
OpenAI
Thinking Level · HighTools
SWE-bench Verified76.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
15
GPT-5.1
OpenAI
Thinking Level · High
SWE-bench Verified76.30
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
16
GPT-5.1
OpenAI
Thinking Level · HighTools
SWE-bench Verified76.30
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
17
Gemini 3.0 Pro (Preview 11-2025)
Google Deep Mind
Thinking Enabled
SWE-bench Verified76.20
LiveCodeBench92.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
18
Qwen3-Max-Thinking
阿里巴巴
Thinking Enabled
SWE-bench Verified75.30
LiveCodeBench85.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
19
o3-pro
OpenAI
Thinking Level · High
SWE-bench Verified75.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
20
Opus 4.1
Anthropic
Extended ThinkingTools
SWE-bench Verified74.50
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
21
GPT-5 Codex
OpenAI
Thinking Level · High
SWE-bench Verified74.50
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
22
Grok 4 Heavy
xAI
Parallel · Thinking EnabledTools
SWE-bench Verified73.50
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
23
Haiku 4.5
Anthropic
Thinking EnabledTools
SWE-bench Verified73.30
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
24
GPT-5
OpenAI
Thinking Level · High
SWE-bench Verified72.80
LiveCodeBench—
SWE-Bench Pro - Public36.30
SWE-bench Multilingual—
Proprietary
25
Claude Sonnet 4
Anthropic
Thinking EnabledTools
SWE-bench Verified72.70
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
26
Claude Opus 4
Anthropic
SWE-bench Verified72.50
LiveCodeBench56.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
27
Grok 4 Code
xAI
SWE-bench Verified72.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
28
Grok Code Fast 1
xAI
Thinking Enabled
SWE-bench Verified70.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
29
GPT-5.1 Codex
OpenAI
Thinking Level · HighTools
SWE-bench Verified70.40
LiveCodeBench85.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
30
Claude Sonnet 3.7
Anthropic
Thinking EnabledTools
SWE-bench Verified70.30
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
31
Qwen3 Max (Preview)
阿里巴巴
SWE-bench Verified69.60
LiveCodeBench57.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
32
OpenAI o3
OpenAI
Thinking Enabled
SWE-bench Verified69.10
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
33
Gemini 3.0 Flash
Google Deep Mind
Thinking Enabled
SWE-bench Verified68.70
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
34
OpenAI o4 - mini
OpenAI
Thinking Enabled
SWE-bench Verified68.10
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
35
Gemini 2.5-Pro
Google Deep Mind
Thinking Enabled
SWE-bench Verified67.20
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
36
Gemini 2.5 Pro Experimental 03-25
Google Deep Mind
SWE-bench Verified63.80
LiveCodeBench70.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
37
Gemini-2.5-Pro-Preview-05-06
Google Deep Mind
SWE-bench Verified63.20
LiveCodeBench77.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
38
Claude Sonnet 3.7
Anthropic
Standard ModeTools
SWE-bench Verified62.30
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
39
Devstral Medium
MistralAI
SWE-bench Verified61.60
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
40
Haiku 4.5
Anthropic
Standard ModeTools
SWE-bench Verified60.60
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
41
Grok 4
xAI
Thinking Enabled
SWE-bench Verified58.60
LiveCodeBench82.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
42
GPT-4.1
OpenAI
SWE-bench Verified54.60
LiveCodeBench40.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
43
Grok 4.1
xAI
SWE-bench Verified54.60
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
44
Gemini 2.5 Flash-Preview-09-2025
Google Deep Mind
Thinking Enabled
SWE-bench Verified54.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
45
Gemini 2.5 Flash
Google Deep Mind
SWE-bench Verified50.00
LiveCodeBench41.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
46
OpenAI o3-mini (high)
OpenAI
SWE-bench Verified49.30
LiveCodeBench69.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
47
Claude 3.5 Sonnet New
Anthropic
SWE-bench Verified49.00
LiveCodeBench38.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
48
OpenAI o1
OpenAI
SWE-bench Verified48.90
LiveCodeBench71.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
49
Gemini 2.5 Flash
Google Deep Mind
Thinking Enabled
SWE-bench Verified48.90
LiveCodeBench55.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
50
OpenAI o1
OpenAI
Thinking Level · High
SWE-bench Verified41.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
Sort by:
Showing 50 of 95 modelsView SWE-bench Verified benchmark page