DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Coding Benchmark Leaderboard

This page provides the LLM coding benchmark leaderboard, covering SWE-Bench Verified, SWE-Bench Pro, LiveCodeBench, and SWE-bench Multilingual datasets, comparing GPT, Claude, Qwen, and DeepSeek models.

Updated on 2026-05-21 22:14:17

As of 2026-05, this page covers SWE-bench Verified, LiveCodeBench, SWE-Bench Pro - Public, SWE-bench Multilingual and related benchmarks for LLM Coding Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Reference: Composite Coding Rankings

There is no single, universally accepted coding leaderboard. Static benchmarks like SWE-bench and HumanEval measure specific skills but can be gamed through targeted fine-tuning. We selected two complementary human-preference leaderboards: LMArena Coding Arena ranks models on general programming tasks (debugging, algorithms, code generation) via anonymous crowd-sourced voting; DesignArena Code Category focuses specifically on visual, front-end code generation (websites, UI components, games) using the same blind-voting methodology. Reading both together gives a fuller picture of coding capability.

LMArena Coding Arena

Full ranking

Elo ratings from anonymous A/B voting on real general coding tasks (debugging, algorithms, code generation) submitted by developers.

Updated 2026-05-14

#ModelElo
1
Anthropic
Opus 4.7 (thinking)Anthropic
1563
2
Anthropic
Opus 4.7Anthropic
1551
3
Anthropic
Claude Opus 4.6 (thinking)Anthropic
1550
4
Anthropic
Claude Opus 4.6Anthropic
1549
5
Anthropic
Claude Opus 4 (thinking-32k)Anthropic
1531
6
F
Muse SparkFacebook AI研究实验室
1530
7
OpenAI
GPT-5.4 (high)OpenAI
1527
8
智
GLM 5.1智谱AI
1527
9
Google Deep Mind
Gemini 3.1 Pro PreviewGoogle Deep Mind
1526
10
Anthropic
Claude Sonnet 4.6Anthropic
1522
Source: LMArena

DesignArena Code Category

Full ranking

Elo ratings from anonymous voting on visual front-end code tasks (websites, UI components, games, data viz) by Arcada Labs.

Updated 2026-05-17

#ModelElo
1
Anthropic
Claude Opus 4.6Anthropic
1348
2
Anthropic
Opus 4.7 (thinking)Anthropic
1345
3
Anthropic
Claude Opus 4.6 (thinking)Anthropic
1344
4
Moonshot AI
Kimi K2.6Moonshot AI
1343
5
智
GLM 5.1智谱AI
1338
6
Anthropic
Opus 4.7Anthropic
1335
7
智
GLM-5-Turbo智谱AI
1334
8
Anthropic
Claude Sonnet 4.6Anthropic
1331
9
X
MiMo-V2.5-ProXiaomi
1329
10
OpenAI
GPT-5.5OpenAI
1320
Source: DesignArena
Benchmark
SWE-bench VerifiedLiveCodeBenchSWE-Bench Pro - PublicSWE-bench Multilingual
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff:

Top picks

Ranked by LiveCodeBench
Current SOTA
DeepSeek-AI

DeepSeek-V4-Pro

DeepSeek-AI

93.50LiveCodeBench
View model
Best Open-Source
DeepSeek-AI

DeepSeek-V4-Flash

DeepSeek-AI

91.60LiveCodeBench−1.90
View model
Best China-Made
阿里巴巴

Qwen3.7-Max-Preview

阿里巴巴

91.60LiveCodeBench−1.90
View model

LLM Performance Results

Data source: DataLearnerAI

Click any row to open the model page. Tick the checkboxes to compare up to 4 models side by side.

RankModel
License
DeepSeek-AI
DeepSeek-V4-Pro
Thinking Level · High
DeepSeek-AI
—93.50——Free commercialDetailsDetails
Google Deep Mind
Gemini 3.0 Pro (Preview 11-2025)
Thinking Enabled
Google Deep Mind
76.2092.00——ProprietaryDetailsDetails
Google Deep Mind
Gemini 3.1 Pro Preview
Thinking Level · HighTools
Google Deep Mind
80.6091.7054.20—ProprietaryDetailsDetails
4
阿里巴巴
Qwen3.7-Max-Preview
Thinking Level · High
阿里巴巴
—91.60——ProprietaryDetailsDetails
5
DeepSeek-AI
DeepSeek-V4-Flash
Thinking Level · High
DeepSeek-AI
—91.60——Free commercialDetailsDetails
6
DeepSeek-AI
DeepSeek-V4-Pro
Thinking Level · High
DeepSeek-AI
—89.80——Free commercialDetailsDetails
7
Moonshot AI
Kimi K2.6
Thinking Enabled
Moonshot AI
—89.60——Free commercialDetailsDetails
8
DeepSeek-AI
DeepSeek-V4-Flash
Thinking Level · High
DeepSeek-AI
—88.40——Free commercialDetailsDetails
9
Google Deep Mind
Gemini 2.5 Deep Think
Deep Thinking Mode
Google Deep Mind
—87.60——ProprietaryDetailsDetails
10
阿里巴巴
Qwen3.6-Max-Preview
Thinking Level · High
阿里巴巴
—87.10——ProprietaryDetailsDetails
11
阿里巴巴
Qwen 3.6 Plus Preview
Thinking Enabled
阿里巴巴
—87.1056.6073.80ProprietaryDetailsDetails
12
Anthropic
Opus 4.5
Extended ThinkingTools
Anthropic
80.9087.00——ProprietaryDetailsDetails
13
StepFunAI
Step 3.5 Flash
Thinking Enabled
StepFunAI
74.4086.40——Free commercialDetailsDetails
14
阿里巴巴
Qwen3-Max-Thinking
Thinking Enabled
阿里巴巴
75.3085.90——ProprietaryDetailsDetails
15
OpenAI
GPT-5.1 Codex
Thinking Level · HighTools
OpenAI
70.4085.50——ProprietaryDetailsDetails
16
Moonshot AI
Kimi K2.5
Thinking Enabled
Moonshot AI
—85.00—73.00Free commercialDetailsDetails
17
智谱AI
GLM-4.7
Thinking Enabled
智谱AI
—84.90——Free commercialDetailsDetails
18
智谱AI
GLM-4.6
Thinking EnabledTools
智谱AI
68.0084.50——Free commercialDetailsDetails
19
阿里巴巴
Qwen3.6-27B
Thinking Enabled
阿里巴巴
—83.90——Free commercialDetailsDetails
20
阿里巴巴
Qwen3.5-397B-A17B
Thinking Enabled
阿里巴巴
—83.6050.9069.30Free commercialDetailsDetails
21
DeepSeek-AI
DeepSeek V3.2
Thinking Enabled
DeepSeek-AI
70.2083.3040.90—Free commercialDetailsDetails
22
Moonshot AI
Kimi K2 Thinking
Thinking Enabled
Moonshot AI
—83.10——Free commercialDetailsDetails
23
MiniMaxAI
MiniMax M2
Thinking Enabled
MiniMaxAI
—83.00——Free commercialDetailsDetails
24
智谱AI
GLM-4.6
Thinking Enabled
智谱AI
—82.80——Free commercialDetailsDetails
25
xAI
Grok 4
Thinking Enabled
xAI
58.6082.00——ProprietaryDetailsDetails
26
xAI
Grok 4.1 Fast
Thinking Enabled
xAI
—82.00——ProprietaryDetailsDetails
27
阿里巴巴
Qwen3.5-27B
Thinking EnabledTools
阿里巴巴
—80.70——Free commercialDetailsDetails
28
阿里巴巴
Qwen3.6-35B-A3B
Thinking Enabled
阿里巴巴
73.4080.4049.5067.20Free commercialDetailsDetails
29
Google Deep Mind
Gemini 2.5 Pro Deep Think
Google Deep Mind
—80.40——ProprietaryDetailsDetails
30
DeepMind
Gemma 4 31B
Thinking Enabled
DeepMind
—80.00——Free commercialDetailsDetails
31
DeepSeek-AI
DeepSeek-V3.1 Terminus
Thinking Enabled
DeepSeek-AI
—80.00——Free commercialDetailsDetails
32
xAI
Grok 4 Fast
Thinking Enabled
xAI
—80.00——ProprietaryDetailsDetails
33
xAI
Grok-3 - Reasoning Beta
xAI
—79.40——ProprietaryDetailsDetails
34
Google Deep Mind
Gemini-2.5-Pro-Preview-05-06
Google Deep Mind
63.2077.10——ProprietaryDetailsDetails
35
Google Deep Mind
Gemini 2.5-Pro
Google Deep Mind
—77.10——ProprietaryDetailsDetails
36
DeepMind
Gemma 4 26B A4B
Thinking Enabled
DeepMind
—77.10——Free commercialDetailsDetails
37
Anthropic
Claude Opus 4.6
Extended Thinking
Anthropic
—76.00——ProprietaryDetailsDetails
38
OpenAI
OpenAI o3
OpenAI
—75.80——ProprietaryDetailsDetails
39
DeepSeek-AI
DeepSeek-V3.1 Terminus
DeepSeek-AI
68.4074.90——Free commercialDetailsDetails
40
DeepSeek-AI
DeepSeek-V3.1
Thinking Enabled
DeepSeek-AI
—74.80——Free commercialDetailsDetails
41
DeepSeek-AI
DeepSeek V3.2-Exp
Thinking Enabled
DeepSeek-AI
—74.10——Free commercialDetailsDetails
42
阿里巴巴
Qwen3-235B-A22B-Thinking
Thinking Enabled
阿里巴巴
—74.10——Free commercialDetailsDetails
43
阿里巴巴
Qwen3-235B-A22B-Thinking-2507
Thinking Enabled
阿里巴巴
—74.10——Free commercialDetailsDetails
44
Moonshot AI
Kimi-k1.6-IOI-high
Moonshot AI
—73.80——ProprietaryDetailsDetails
45
DeepSeek-AI
DeepSeek-R1-0528
Thinking Enabled
DeepSeek-AI
57.6073.30——Free commercialDetailsDetails
46
智谱AI
GLM-4.5
Thinking Enabled
智谱AI
64.2072.90——Free commercialDetailsDetails
47
OpenAI
OpenAI o1
OpenAI
48.9071.00——ProprietaryDetailsDetails
48
Anthropic
Claude Sonnet 4.5
Thinking Enabled
Anthropic
—71.0043.60—ProprietaryDetailsDetails
49
智谱AI
GLM-4.5-Air
Thinking Enabled
智谱AI
57.6070.70——Free commercialDetailsDetails
50
阿里巴巴
Qwen3-235B-A22B
阿里巴巴
34.4070.70——Free commercialDetailsDetails
DeepSeek-V4-Pro
DeepSeek-AI
Thinking Level · High
SWE-bench Verified—
LiveCodeBench93.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
Gemini 3.0 Pro (Preview 11-2025)
Google Deep Mind
Thinking Enabled
SWE-bench Verified76.20
LiveCodeBench92.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
Gemini 3.1 Pro Preview
Google Deep Mind
Thinking Level · HighTools
SWE-bench Verified80.60
LiveCodeBench91.70
SWE-Bench Pro - Public54.20
SWE-bench Multilingual—
Proprietary
4
Qwen3.7-Max-Preview
阿里巴巴
Thinking Level · High
SWE-bench Verified—
LiveCodeBench91.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
5
DeepSeek-V4-Flash
DeepSeek-AI
Thinking Level · High
SWE-bench Verified—
LiveCodeBench91.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
6
DeepSeek-V4-Pro
DeepSeek-AI
Thinking Level · High
SWE-bench Verified—
LiveCodeBench89.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
7
Kimi K2.6
Moonshot AI
Thinking Enabled
SWE-bench Verified—
LiveCodeBench89.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
8
DeepSeek-V4-Flash
DeepSeek-AI
Thinking Level · High
SWE-bench Verified—
LiveCodeBench88.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
9
Gemini 2.5 Deep Think
Google Deep Mind
Deep Thinking Mode
SWE-bench Verified—
LiveCodeBench87.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
10
Qwen3.6-Max-Preview
阿里巴巴
Thinking Level · High
SWE-bench Verified—
LiveCodeBench87.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
11
Qwen 3.6 Plus Preview
阿里巴巴
Thinking Enabled
SWE-bench Verified—
LiveCodeBench87.10
SWE-Bench Pro - Public56.60
SWE-bench Multilingual73.80
Proprietary
12
Opus 4.5
Anthropic
Extended ThinkingTools
SWE-bench Verified80.90
LiveCodeBench87.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
13
Step 3.5 Flash
StepFunAI
Thinking Enabled
SWE-bench Verified74.40
LiveCodeBench86.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
14
Qwen3-Max-Thinking
阿里巴巴
Thinking Enabled
SWE-bench Verified75.30
LiveCodeBench85.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
15
GPT-5.1 Codex
OpenAI
Thinking Level · HighTools
SWE-bench Verified70.40
LiveCodeBench85.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
16
Kimi K2.5
Moonshot AI
Thinking Enabled
SWE-bench Verified—
LiveCodeBench85.00
SWE-Bench Pro - Public—
SWE-bench Multilingual73.00
Free commercial
17
GLM-4.7
智谱AI
Thinking Enabled
SWE-bench Verified—
LiveCodeBench84.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
18
GLM-4.6
智谱AI
Thinking EnabledTools
SWE-bench Verified68.00
LiveCodeBench84.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
19
Qwen3.6-27B
阿里巴巴
Thinking Enabled
SWE-bench Verified—
LiveCodeBench83.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
20
Qwen3.5-397B-A17B
阿里巴巴
Thinking Enabled
SWE-bench Verified—
LiveCodeBench83.60
SWE-Bench Pro - Public50.90
SWE-bench Multilingual69.30
Free commercial
21
DeepSeek V3.2
DeepSeek-AI
Thinking Enabled
SWE-bench Verified70.20
LiveCodeBench83.30
SWE-Bench Pro - Public40.90
SWE-bench Multilingual—
Free commercial
22
Kimi K2 Thinking
Moonshot AI
Thinking Enabled
SWE-bench Verified—
LiveCodeBench83.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
23
MiniMax M2
MiniMaxAI
Thinking Enabled
SWE-bench Verified—
LiveCodeBench83.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
24
GLM-4.6
智谱AI
Thinking Enabled
SWE-bench Verified—
LiveCodeBench82.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
25
Grok 4
xAI
Thinking Enabled
SWE-bench Verified58.60
LiveCodeBench82.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
26
Grok 4.1 Fast
xAI
Thinking Enabled
SWE-bench Verified—
LiveCodeBench82.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
27
Qwen3.5-27B
阿里巴巴
Thinking EnabledTools
SWE-bench Verified—
LiveCodeBench80.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
28
Qwen3.6-35B-A3B
阿里巴巴
Thinking Enabled
SWE-bench Verified73.40
LiveCodeBench80.40
SWE-Bench Pro - Public49.50
SWE-bench Multilingual67.20
Free commercial
29
Gemini 2.5 Pro Deep Think
Google Deep Mind
SWE-bench Verified—
LiveCodeBench80.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
30
Gemma 4 31B
DeepMind
Thinking Enabled
SWE-bench Verified—
LiveCodeBench80.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
31
DeepSeek-V3.1 Terminus
DeepSeek-AI
Thinking Enabled
SWE-bench Verified—
LiveCodeBench80.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
32
Grok 4 Fast
xAI
Thinking Enabled
SWE-bench Verified—
LiveCodeBench80.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
33
Grok-3 - Reasoning Beta
xAI
SWE-bench Verified—
LiveCodeBench79.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
34
Gemini-2.5-Pro-Preview-05-06
Google Deep Mind
SWE-bench Verified63.20
LiveCodeBench77.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
35
Gemini 2.5-Pro
Google Deep Mind
SWE-bench Verified—
LiveCodeBench77.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
36
Gemma 4 26B A4B
DeepMind
Thinking Enabled
SWE-bench Verified—
LiveCodeBench77.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
37
Claude Opus 4.6
Anthropic
Extended Thinking
SWE-bench Verified—
LiveCodeBench76.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
38
OpenAI o3
OpenAI
SWE-bench Verified—
LiveCodeBench75.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
39
DeepSeek-V3.1 Terminus
DeepSeek-AI
SWE-bench Verified68.40
LiveCodeBench74.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
40
DeepSeek-V3.1
DeepSeek-AI
Thinking Enabled
SWE-bench Verified—
LiveCodeBench74.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
41
DeepSeek V3.2-Exp
DeepSeek-AI
Thinking Enabled
SWE-bench Verified—
LiveCodeBench74.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
42
Qwen3-235B-A22B-Thinking
阿里巴巴
Thinking Enabled
SWE-bench Verified—
LiveCodeBench74.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
43
Qwen3-235B-A22B-Thinking-2507
阿里巴巴
Thinking Enabled
SWE-bench Verified—
LiveCodeBench74.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
44
Kimi-k1.6-IOI-high
Moonshot AI
SWE-bench Verified—
LiveCodeBench73.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
45
DeepSeek-R1-0528
DeepSeek-AI
Thinking Enabled
SWE-bench Verified57.60
LiveCodeBench73.30
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
46
GLM-4.5
智谱AI
Thinking Enabled
SWE-bench Verified64.20
LiveCodeBench72.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
47
OpenAI o1
OpenAI
SWE-bench Verified48.90
LiveCodeBench71.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
48
Claude Sonnet 4.5
Anthropic
Thinking Enabled
SWE-bench Verified—
LiveCodeBench71.00
SWE-Bench Pro - Public43.60
SWE-bench Multilingual—
Proprietary
49
GLM-4.5-Air
智谱AI
Thinking Enabled
SWE-bench Verified57.60
LiveCodeBench70.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
50
Qwen3-235B-A22B
阿里巴巴
SWE-bench Verified34.40
LiveCodeBench70.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
Sort by:
Showing 50 of 206 modelsView LiveCodeBench benchmark page