DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Coding Benchmark Leaderboard

This page provides the LLM coding benchmark leaderboard, covering SWE-Bench Verified, SWE-Bench Pro, LiveCodeBench, and SWE-bench Multilingual datasets, comparing GPT, Claude, Qwen, and DeepSeek models.

Updated on 2026-05-21 22:14:17

As of 2026-05, this page covers SWE-bench Verified, LiveCodeBench, SWE-Bench Pro - Public, SWE-bench Multilingual and related benchmarks for LLM Coding Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Reference: Composite Coding Rankings

There is no single, universally accepted coding leaderboard. Static benchmarks like SWE-bench and HumanEval measure specific skills but can be gamed through targeted fine-tuning. We selected two complementary human-preference leaderboards: LMArena Coding Arena ranks models on general programming tasks (debugging, algorithms, code generation) via anonymous crowd-sourced voting; DesignArena Code Category focuses specifically on visual, front-end code generation (websites, UI components, games) using the same blind-voting methodology. Reading both together gives a fuller picture of coding capability.

LMArena Coding Arena

Full ranking

Elo ratings from anonymous A/B voting on real general coding tasks (debugging, algorithms, code generation) submitted by developers.

Updated 2026-05-14

#ModelElo
1
Anthropic
Opus 4.7 (thinking)Anthropic
1563
2
Anthropic
Opus 4.7Anthropic
1551
3
Anthropic
Claude Opus 4.6 (thinking)Anthropic
1550
4
Anthropic
Claude Opus 4.6Anthropic
1549
5
Anthropic
Claude Opus 4 (thinking-32k)Anthropic
1531
6
F
Muse SparkFacebook AI研究实验室
1530
7
OpenAI
GPT-5.4 (high)OpenAI
1527
8
智
GLM 5.1智谱AI
1527
9
Google Deep Mind
Gemini 3.1 Pro PreviewGoogle Deep Mind
1526
10
Anthropic
Claude Sonnet 4.6Anthropic
1522
Source: LMArena

DesignArena Code Category

Full ranking

Elo ratings from anonymous voting on visual front-end code tasks (websites, UI components, games, data viz) by Arcada Labs.

Updated 2026-05-17

#ModelElo
1
Anthropic
Claude Opus 4.6Anthropic
1348
2
Anthropic
Opus 4.7 (thinking)Anthropic
1345
3
Anthropic
Claude Opus 4.6 (thinking)Anthropic
1344
4
Moonshot AI
Kimi K2.6Moonshot AI
1343
5
智
GLM 5.1智谱AI
1338
6
Anthropic
Opus 4.7Anthropic
1335
7
智
GLM-5-Turbo智谱AI
1334
8
Anthropic
Claude Sonnet 4.6Anthropic
1331
9
X
MiMo-V2.5-ProXiaomi
1329
10
OpenAI
GPT-5.5OpenAI
1320
Source: DesignArena
Benchmark
SWE-bench VerifiedLiveCodeBenchSWE-Bench Pro - PublicSWE-bench Multilingual
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff:

Top picks

Ranked by SWE-bench Verified
Current SOTA
Anthropic

Claude Mythos Preview

Anthropic

93.90SWE-bench Verified
View model
Best Open-Source
DeepSeek-AI

DeepSeek-V4-Pro

DeepSeek-AI

80.60SWE-bench Verified−13.30
View model
Best China-Made
阿里巴巴

Qwen3.7-Max-Preview

阿里巴巴

80.40SWE-bench Verified−13.50
View model

LLM Performance Results

Data source: DataLearnerAI

Click any row to open the model page. Tick the checkboxes to compare up to 4 models side by side.

RankModel
License
Anthropic
Claude Mythos Preview
Extended ThinkingTools
Anthropic
93.90—77.8087.30ProprietaryDetailsDetails
Anthropic
Opus 4.7
Extended ThinkingTools
Anthropic
87.60—64.30—ProprietaryDetailsDetails
Anthropic
Claude Sonnet 5
Parallel · Thinking Enabled
Anthropic
82.00———ProprietaryDetailsDetails
4
Anthropic
Claude Sonnet 4.5
Parallel · Thinking EnabledTools
Anthropic
82.00———ProprietaryDetailsDetails
5
Anthropic
Opus 4.5
Extended ThinkingTools
Anthropic
80.9087.00——ProprietaryDetailsDetails
6
Anthropic
Claude Opus 4.6
Extended ThinkingTools
Anthropic
80.84——72.00ProprietaryDetailsDetails
7
DeepSeek-AI
DeepSeek-V4-Pro
Thinking Level · Extra HighTools
DeepSeek-AI
80.60—55.4076.20Free commercialDetailsDetails
8
Google Deep Mind
Gemini 3.1 Pro Preview
Thinking Level · HighTools
Google Deep Mind
80.6091.7054.20—ProprietaryDetailsDetails
9
阿里巴巴
Qwen3.7-Max-Preview
Thinking EnabledTools
阿里巴巴
80.40—60.6078.30ProprietaryDetailsDetails
10
Anthropic
Claude Sonnet 4
Parallel · Thinking EnabledTools
Anthropic
80.20———ProprietaryDetailsDetails
11
MiniMaxAI
MiniMax M2.5
Thinking EnabledTools
MiniMaxAI
80.20—55.40—Free commercialDetailsDetails
12
Moonshot AI
Kimi K2.6
Thinking EnabledTools
Moonshot AI
80.20—58.6076.70Free commercialDetailsDetails
13
OpenAI
GPT-5.2
Thinking Level · Extra HighTools
OpenAI
80.00—55.60—ProprietaryDetailsDetails
14
Anthropic
Claude Sonnet 4.6
Thinking Enabled
Anthropic
79.60———ProprietaryDetailsDetails
15
DeepSeek-AI
DeepSeek-V4-Pro
Thinking Level · HighTools
DeepSeek-AI
79.40—54.4074.10Free commercialDetailsDetails
16
DeepSeek-AI
DeepSeek-V4-Flash
Thinking Level · Extra HighTools
DeepSeek-AI
79.00—52.6073.30Free commercialDetailsDetails
17
阿里巴巴
Qwen 3.6 Plus Preview
Thinking EnabledTools
阿里巴巴
78.80—56.60—ProprietaryDetailsDetails
18
阿里巴巴
Qwen3.6-Max-Preview
Thinking EnabledTools
阿里巴巴
78.80—56.6073.80ProprietaryDetailsDetails
19
DeepSeek-AI
DeepSeek-V4-Flash
Thinking Level · HighTools
DeepSeek-AI
78.60—52.3070.20Free commercialDetailsDetails
20
智谱AI
GLM-5
Thinking Enabled
智谱AI
77.80———Free commercialDetailsDetails
21
Facebook AI研究实验室
Muse Spark
Thinking EnabledTools
Facebook AI研究实验室
77.40———ProprietaryDetailsDetails
22
Anthropic
Claude Sonnet 4.5
Thinking EnabledTools
Anthropic
77.20———ProprietaryDetailsDetails
23
阿里巴巴
Qwen3.6-27B
Thinking EnabledTools
阿里巴巴
77.20—53.5071.30Free commercialDetailsDetails
24
OpenAI
GPT-5.1-Codex-Max
Thinking Level · HighTools
OpenAI
76.80———ProprietaryDetailsDetails
25
Moonshot AI
Kimi K2.5
Thinking EnabledTools
Moonshot AI
76.80—50.70—Free commercialDetailsDetails
26
阿里巴巴
Qwen3.5-397B-A17B
Thinking EnabledTools
阿里巴巴
76.40———Free commercialDetailsDetails
27
OpenAI
GPT-5.1
Thinking Level · High
OpenAI
76.30———ProprietaryDetailsDetails
28
OpenAI
GPT-5.1
Thinking Level · HighTools
OpenAI
76.30———ProprietaryDetailsDetails
29
Google Deep Mind
Gemini 3.0 Pro (Preview 11-2025)
Thinking Enabled
Google Deep Mind
76.2092.00——ProprietaryDetailsDetails
30
阿里巴巴
Qwen3-Max-Thinking
Thinking Enabled
阿里巴巴
75.3085.90——ProprietaryDetailsDetails
31
OpenAI
o3-pro
Thinking Level · High
OpenAI
75.00———ProprietaryDetailsDetails
32
MiniMaxAI
M2.1
Thinking Enabled
MiniMaxAI
74.80———Free commercialDetailsDetails
33
Anthropic
Opus 4.1
Extended ThinkingTools
Anthropic
74.50———ProprietaryDetailsDetails
34
OpenAI
GPT-5 Codex
Thinking Level · High
OpenAI
74.50———ProprietaryDetailsDetails
35
StepFunAI
Step 3.5 Flash
Thinking Enabled
StepFunAI
74.4086.40——Free commercialDetailsDetails
36
智谱AI
GLM-4.7
Thinking EnabledTools
智谱AI
73.80—40.60—Free commercialDetailsDetails
37
DeepSeek-AI
DeepSeek-V4-Flash
Standard ModeTools
DeepSeek-AI
73.70—49.1069.70Free commercialDetailsDetails
38
DeepSeek-AI
DeepSeek-V4-Pro
Standard ModeTools
DeepSeek-AI
73.60—52.1069.80Free commercialDetailsDetails
39
xAI
Grok 4 Heavy
Parallel · Thinking EnabledTools
xAI
73.50———ProprietaryDetailsDetails
40
阿里巴巴
Qwen3.6-35B-A3B
Thinking Enabled
阿里巴巴
73.4080.4049.5067.20Free commercialDetailsDetails
41
Anthropic
Haiku 4.5
Thinking EnabledTools
Anthropic
73.30———ProprietaryDetailsDetails
42
DeepSeek-AI
DeepSeek V3.2
Thinking EnabledTools
DeepSeek-AI
73.10———Free commercialDetailsDetails
43
OpenAI
GPT-5
Thinking Level · High
OpenAI
72.80—36.30—ProprietaryDetailsDetails
44
Anthropic
Claude Sonnet 4
Thinking EnabledTools
Anthropic
72.70———ProprietaryDetailsDetails
45
Anthropic
Claude Opus 4
Anthropic
72.5056.60——ProprietaryDetailsDetails
46
阿里巴巴
Qwen3.5-27B
Thinking Enabled
阿里巴巴
72.40———Free commercialDetailsDetails
47
xAI
Grok 4 Code
xAI
72.00———ProprietaryDetailsDetails
48
Moonshot AI
Kimi K2 Thinking
Thinking EnabledTools
Moonshot AI
71.30———Free commercialDetailsDetails
49
xAI
Grok Code Fast 1
Thinking Enabled
xAI
70.80———ProprietaryDetailsDetails
50
阿里巴巴
Qwen3-Coder-Next
Standard ModeTools
阿里巴巴
70.60—44.30—Free commercialDetailsDetails
Claude Mythos Preview
Anthropic
Extended ThinkingTools
SWE-bench Verified93.90
LiveCodeBench—
SWE-Bench Pro - Public77.80
SWE-bench Multilingual87.30
Proprietary
Opus 4.7
Anthropic
Extended ThinkingTools
SWE-bench Verified87.60
LiveCodeBench—
SWE-Bench Pro - Public64.30
SWE-bench Multilingual—
Proprietary
Claude Sonnet 5
Anthropic
Parallel · Thinking Enabled
SWE-bench Verified82.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
4
Claude Sonnet 4.5
Anthropic
Parallel · Thinking EnabledTools
SWE-bench Verified82.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
5
Opus 4.5
Anthropic
Extended ThinkingTools
SWE-bench Verified80.90
LiveCodeBench87.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
6
Claude Opus 4.6
Anthropic
Extended ThinkingTools
SWE-bench Verified80.84
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual72.00
Proprietary
7
DeepSeek-V4-Pro
DeepSeek-AI
Thinking Level · Extra HighTools
SWE-bench Verified80.60
LiveCodeBench—
SWE-Bench Pro - Public55.40
SWE-bench Multilingual76.20
Free commercial
8
Gemini 3.1 Pro Preview
Google Deep Mind
Thinking Level · HighTools
SWE-bench Verified80.60
LiveCodeBench91.70
SWE-Bench Pro - Public54.20
SWE-bench Multilingual—
Proprietary
9
Qwen3.7-Max-Preview
阿里巴巴
Thinking EnabledTools
SWE-bench Verified80.40
LiveCodeBench—
SWE-Bench Pro - Public60.60
SWE-bench Multilingual78.30
Proprietary
10
Claude Sonnet 4
Anthropic
Parallel · Thinking EnabledTools
SWE-bench Verified80.20
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
11
MiniMax M2.5
MiniMaxAI
Thinking EnabledTools
SWE-bench Verified80.20
LiveCodeBench—
SWE-Bench Pro - Public55.40
SWE-bench Multilingual—
Free commercial
12
Kimi K2.6
Moonshot AI
Thinking EnabledTools
SWE-bench Verified80.20
LiveCodeBench—
SWE-Bench Pro - Public58.60
SWE-bench Multilingual76.70
Free commercial
13
GPT-5.2
OpenAI
Thinking Level · Extra HighTools
SWE-bench Verified80.00
LiveCodeBench—
SWE-Bench Pro - Public55.60
SWE-bench Multilingual—
Proprietary
14
Claude Sonnet 4.6
Anthropic
Thinking Enabled
SWE-bench Verified79.60
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
15
DeepSeek-V4-Pro
DeepSeek-AI
Thinking Level · HighTools
SWE-bench Verified79.40
LiveCodeBench—
SWE-Bench Pro - Public54.40
SWE-bench Multilingual74.10
Free commercial
16
DeepSeek-V4-Flash
DeepSeek-AI
Thinking Level · Extra HighTools
SWE-bench Verified79.00
LiveCodeBench—
SWE-Bench Pro - Public52.60
SWE-bench Multilingual73.30
Free commercial
17
Qwen 3.6 Plus Preview
阿里巴巴
Thinking EnabledTools
SWE-bench Verified78.80
LiveCodeBench—
SWE-Bench Pro - Public56.60
SWE-bench Multilingual—
Proprietary
18
Qwen3.6-Max-Preview
阿里巴巴
Thinking EnabledTools
SWE-bench Verified78.80
LiveCodeBench—
SWE-Bench Pro - Public56.60
SWE-bench Multilingual73.80
Proprietary
19
DeepSeek-V4-Flash
DeepSeek-AI
Thinking Level · HighTools
SWE-bench Verified78.60
LiveCodeBench—
SWE-Bench Pro - Public52.30
SWE-bench Multilingual70.20
Free commercial
20
GLM-5
智谱AI
Thinking Enabled
SWE-bench Verified77.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
21
Muse Spark
Facebook AI研究实验室
Thinking EnabledTools
SWE-bench Verified77.40
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
22
Claude Sonnet 4.5
Anthropic
Thinking EnabledTools
SWE-bench Verified77.20
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
23
Qwen3.6-27B
阿里巴巴
Thinking EnabledTools
SWE-bench Verified77.20
LiveCodeBench—
SWE-Bench Pro - Public53.50
SWE-bench Multilingual71.30
Free commercial
24
GPT-5.1-Codex-Max
OpenAI
Thinking Level · HighTools
SWE-bench Verified76.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
25
Kimi K2.5
Moonshot AI
Thinking EnabledTools
SWE-bench Verified76.80
LiveCodeBench—
SWE-Bench Pro - Public50.70
SWE-bench Multilingual—
Free commercial
26
Qwen3.5-397B-A17B
阿里巴巴
Thinking EnabledTools
SWE-bench Verified76.40
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
27
GPT-5.1
OpenAI
Thinking Level · High
SWE-bench Verified76.30
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
28
GPT-5.1
OpenAI
Thinking Level · HighTools
SWE-bench Verified76.30
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
29
Gemini 3.0 Pro (Preview 11-2025)
Google Deep Mind
Thinking Enabled
SWE-bench Verified76.20
LiveCodeBench92.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
30
Qwen3-Max-Thinking
阿里巴巴
Thinking Enabled
SWE-bench Verified75.30
LiveCodeBench85.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
31
o3-pro
OpenAI
Thinking Level · High
SWE-bench Verified75.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
32
M2.1
MiniMaxAI
Thinking Enabled
SWE-bench Verified74.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
33
Opus 4.1
Anthropic
Extended ThinkingTools
SWE-bench Verified74.50
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
34
GPT-5 Codex
OpenAI
Thinking Level · High
SWE-bench Verified74.50
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
35
Step 3.5 Flash
StepFunAI
Thinking Enabled
SWE-bench Verified74.40
LiveCodeBench86.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
36
GLM-4.7
智谱AI
Thinking EnabledTools
SWE-bench Verified73.80
LiveCodeBench—
SWE-Bench Pro - Public40.60
SWE-bench Multilingual—
Free commercial
37
DeepSeek-V4-Flash
DeepSeek-AI
Standard ModeTools
SWE-bench Verified73.70
LiveCodeBench—
SWE-Bench Pro - Public49.10
SWE-bench Multilingual69.70
Free commercial
38
DeepSeek-V4-Pro
DeepSeek-AI
Standard ModeTools
SWE-bench Verified73.60
LiveCodeBench—
SWE-Bench Pro - Public52.10
SWE-bench Multilingual69.80
Free commercial
39
Grok 4 Heavy
xAI
Parallel · Thinking EnabledTools
SWE-bench Verified73.50
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
40
Qwen3.6-35B-A3B
阿里巴巴
Thinking Enabled
SWE-bench Verified73.40
LiveCodeBench80.40
SWE-Bench Pro - Public49.50
SWE-bench Multilingual67.20
Free commercial
41
Haiku 4.5
Anthropic
Thinking EnabledTools
SWE-bench Verified73.30
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
42
DeepSeek V3.2
DeepSeek-AI
Thinking EnabledTools
SWE-bench Verified73.10
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
43
GPT-5
OpenAI
Thinking Level · High
SWE-bench Verified72.80
LiveCodeBench—
SWE-Bench Pro - Public36.30
SWE-bench Multilingual—
Proprietary
44
Claude Sonnet 4
Anthropic
Thinking EnabledTools
SWE-bench Verified72.70
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
45
Claude Opus 4
Anthropic
SWE-bench Verified72.50
LiveCodeBench56.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
46
Qwen3.5-27B
阿里巴巴
Thinking Enabled
SWE-bench Verified72.40
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
47
Grok 4 Code
xAI
SWE-bench Verified72.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
48
Kimi K2 Thinking
Moonshot AI
Thinking EnabledTools
SWE-bench Verified71.30
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
49
Grok Code Fast 1
xAI
Thinking Enabled
SWE-bench Verified70.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
50
Qwen3-Coder-Next
阿里巴巴
Standard ModeTools
SWE-bench Verified70.60
LiveCodeBench—
SWE-Bench Pro - Public44.30
SWE-bench Multilingual—
Free commercial
Sort by:
Showing 50 of 206 modelsView SWE-bench Verified benchmark page