DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Coding Benchmark Leaderboard

This page provides the LLM coding benchmark leaderboard, covering SWE-Bench Verified, SWE-Bench Pro, LiveCodeBench, and SWE-bench Multilingual datasets, comparing GPT, Claude, Qwen, and DeepSeek models.

Updated on 2026-05-21 22:14:17

As of 2026-05, this page covers SWE-bench Verified, LiveCodeBench, SWE-Bench Pro - Public, SWE-bench Multilingual and related benchmarks for LLM Coding Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Reference: Composite Coding Rankings

There is no single, universally accepted coding leaderboard. Static benchmarks like SWE-bench and HumanEval measure specific skills but can be gamed through targeted fine-tuning. We selected two complementary human-preference leaderboards: LMArena Coding Arena ranks models on general programming tasks (debugging, algorithms, code generation) via anonymous crowd-sourced voting; DesignArena Code Category focuses specifically on visual, front-end code generation (websites, UI components, games) using the same blind-voting methodology. Reading both together gives a fuller picture of coding capability.

LMArena Coding Arena

Full ranking

Elo ratings from anonymous A/B voting on real general coding tasks (debugging, algorithms, code generation) submitted by developers.

Updated 2026-05-14

#ModelElo
1
Anthropic
Opus 4.7 (thinking)Anthropic
1563
2
Anthropic
Opus 4.7Anthropic
1551
3
Anthropic
Claude Opus 4.6 (thinking)Anthropic
1550
4
Anthropic
Claude Opus 4.6Anthropic
1549
5
Anthropic
Claude Opus 4 (thinking-32k)Anthropic
1531
6
F
Muse SparkFacebook AI研究实验室
1530
7
OpenAI
GPT-5.4 (high)OpenAI
1527
8
智
GLM 5.1智谱AI
1527
9
Google Deep Mind
Gemini 3.1 Pro PreviewGoogle Deep Mind
1526
10
Anthropic
Claude Sonnet 4.6Anthropic
1522
Benchmark
SWE-bench VerifiedLiveCodeBenchSWE-Bench Pro - PublicSWE-bench Multilingual
More Benchmarks
Model Size:All3B and below7B13B

Top picks

Ranked by LiveCodeBench
Current SOTA
DeepSeek-AI

DeepSeek-V4-Pro

DeepSeek-AI

93.50LiveCodeBench
View model
Best Open-Source
DeepSeek-AI

DeepSeek-V4-Flash

DeepSeek-AI

91.60LiveCodeBench−1.90
View model
Best China-Made
阿里巴巴

Qwen3.7-Max-Preview

阿里巴巴

91.60LiveCodeBench−1.90
View model

LLM Performance Results

Data source: DataLearnerAI

Click any row to open the model page. Tick the checkboxes to compare up to 4 models side by side.

RankModel
License
DeepSeek-AI
DeepSeek-V4-Pro
DeepSeek-AI
80.6093.5055.4076.20Free commercialDetailsDetails
阿里巴巴
Qwen3.7-Max-Preview
阿里巴巴
80.4091.6060.6078.30ProprietaryDetailsDetails
DeepSeek-AI
DeepSeek-V4-Flash
DeepSeek-AI
79.0091.6052.6073.30Free commercialDetailsDetails
4
Moonshot AI
Kimi K2.6
Moonshot AI
80.2089.6058.6076.70Free commercialDetailsDetails
5
阿里巴巴
Qwen3.6-Max-Preview
阿里巴巴
78.8087.1057.3073.80ProprietaryDetailsDetails
6
StepFunAI
Step 3.5 Flash
StepFunAI
74.4086.40——Free commercialDetailsDetails
7
阿里巴巴
Qwen3-Max-Thinking
阿里巴巴
75.3085.90——ProprietaryDetailsDetails
8
Moonshot AI
Kimi K2.5
Moonshot AI
76.8085.0050.7073.00Free commercialDetailsDetails
9
智谱AI
GLM-4.7
智谱AI
73.8084.9040.60—Free commercialDetailsDetails
10
智谱AI
GLM-4.6
智谱AI
68.0084.50——Free commercialDetailsDetails
11
DeepSeek-AI
DeepSeek V3.2
DeepSeek-AI
73.1083.3040.90—Free commercialDetailsDetails
12
Moonshot AI
Kimi K2 Thinking
Moonshot AI
71.3083.10——Free commercialDetailsDetails
13
MiniMaxAI
MiniMax M2
MiniMaxAI
69.4083.00——Free commercialDetailsDetails
14
Google Deep Mind
Gemini 2.5 Pro Deep Think
Google Deep Mind
—80.40——ProprietaryDetailsDetails
15
DeepSeek-AI
DeepSeek-V3.1 Terminus
DeepSeek-AI
68.4080.00——Free commercialDetailsDetails
16
xAI
Grok-3 - Reasoning Beta
xAI
—79.40——ProprietaryDetailsDetails
17
Google Deep Mind
Gemini-2.5-Pro-Preview-05-06
Google Deep Mind
63.2077.10——ProprietaryDetailsDetails
18
DeepSeek-AI
DeepSeek-V3.1
DeepSeek-AI
66.0074.80——Free commercialDetailsDetails
19
DeepSeek-AI
DeepSeek V3.2-Exp
DeepSeek-AI
67.8074.10——Free commercialDetailsDetails
20
阿里巴巴
Qwen3-235B-A22B-Thinking-2507
阿里巴巴
—74.10——Free commercialDetailsDetails
21
Moonshot AI
Kimi-k1.6-IOI-high
Moonshot AI
—73.80——ProprietaryDetailsDetails
22
DeepSeek-AI
DeepSeek-R1-0528
DeepSeek-AI
57.6073.30——Free commercialDetailsDetails
23
智谱AI
GLM-4.5
智谱AI
64.2072.90——Free commercialDetailsDetails
24
OpenAI
OpenAI o1
OpenAI
48.9071.00——ProprietaryDetailsDetails
25
智谱AI
GLM-4.5-Air
智谱AI
57.6070.70——Free commercialDetailsDetails
26
阿里巴巴
Qwen3-235B-A22B
阿里巴巴
34.4070.70——Free commercialDetailsDetails
27
xAI
Grok 3
xAI
—70.60——ProprietaryDetailsDetails
28
OpenAI
OpenAI o3-mini (high)
OpenAI
49.3069.50——ProprietaryDetailsDetails
29
OpenAI
OpenAI o3-mini (medium)
OpenAI
—67.40——ProprietaryDetailsDetails
30
StepFunAI
Step3
StepFunAI
—67.10——Free commercialDetailsDetails
31
DeepSeek-AI
DeepSeek-R1
DeepSeek-AI
49.2065.90——Free commercialDetailsDetails
32
Moonshot AI
Kimi-k1.6-IOI
Moonshot AI
—65.90——ProprietaryDetailsDetails
33
阿里巴巴
QwQ-Max-Preview
阿里巴巴
—65.60——Free commercialDetailsDetails
34
MiniMaxAI
MiniMax-M1-80k
MiniMaxAI
56.0065.00——Free commercialDetailsDetails
35
MiniMaxAI
MiniMax-M1-40k
MiniMaxAI
55.6062.30——Free commercialDetailsDetails
36
MistralAI
Magistral-Medium-2506
MistralAI
—59.36——ProprietaryDetailsDetails
37
Anthropic
Claude Opus 4
Anthropic
72.5056.60——ProprietaryDetailsDetails
38
Google Deep Mind
Gemini 2.5 Flash
Google Deep Mind
50.0055.40——ProprietaryDetailsDetails
39
Moonshot AI
Kimi K2
Moonshot AI
51.8053.70——Free commercialDetailsDetails
40
OpenAI
OpenAI o1-mini
OpenAI
—52.00——ProprietaryDetailsDetails
41
阿里巴巴
Qwen3-235B-A22B-2507
阿里巴巴
—51.80——Free commercialDetailsDetails
42
Facebook AI研究实验室
Llama 4 Behemoth Instruct
Facebook AI研究实验室
—49.40——Free commercialDetailsDetails
43
DeepSeek-AI
DeepSeek-V3-0324
DeepSeek-AI
38.8049.20——Free commercialDetailsDetails
44
OpenAI
GPT-4.5
OpenAI
38.0046.40——ProprietaryDetailsDetails
45
Facebook AI研究实验室
Llama 4 Maverick Instruct
Facebook AI研究实验室
—43.40——Free commercialDetailsDetails
46
OpenAI
GPT-4.1
OpenAI
54.6040.50——ProprietaryDetailsDetails
47
百度
ERNIE-4.5-VL-424B-A47B-Base
百度
—38.80——Free commercialDetailsDetails
48
百度
ERNIE-4.5-300B-A47B
百度
—38.80——Free commercialDetailsDetails
49
MistralAI
Codestral 25.01
MistralAI
—37.90——ProprietaryDetailsDetails
50
DeepSeek-AI
DeepSeek-V3
DeepSeek-AI
—34.60——Free commercialDetailsDetails
DeepSeek-V4-Pro
DeepSeek-AI
SWE-bench Verified80.60
LiveCodeBench93.50
SWE-Bench Pro - Public55.40
SWE-bench Multilingual76.20
Free commercial
Qwen3.7-Max-Preview
阿里巴巴
SWE-bench Verified80.40
LiveCodeBench91.60
SWE-Bench Pro - Public60.60
SWE-bench Multilingual78.30
Proprietary
DeepSeek-V4-Flash
DeepSeek-AI
SWE-bench Verified79.00
LiveCodeBench91.60
SWE-Bench Pro - Public52.60
SWE-bench Multilingual73.30
Free commercial
4
Kimi K2.6
Moonshot AI
SWE-bench Verified80.20
LiveCodeBench89.60
SWE-Bench Pro - Public58.60
SWE-bench Multilingual76.70
Free commercial
5
Qwen3.6-Max-Preview
阿里巴巴
SWE-bench Verified78.80
LiveCodeBench87.10
SWE-Bench Pro - Public57.30
SWE-bench Multilingual73.80
Proprietary
6
Step 3.5 Flash
StepFunAI
SWE-bench Verified74.40
LiveCodeBench86.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
7
Qwen3-Max-Thinking
阿里巴巴
SWE-bench Verified75.30
LiveCodeBench85.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
8
Kimi K2.5
Moonshot AI
SWE-bench Verified76.80
LiveCodeBench85.00
SWE-Bench Pro - Public50.70
SWE-bench Multilingual73.00
Free commercial
9
GLM-4.7
智谱AI
SWE-bench Verified73.80
LiveCodeBench84.90
SWE-Bench Pro - Public40.60
SWE-bench Multilingual—
Free commercial
10
GLM-4.6
智谱AI
SWE-bench Verified68.00
LiveCodeBench84.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
11
DeepSeek V3.2
DeepSeek-AI
SWE-bench Verified73.10
LiveCodeBench83.30
SWE-Bench Pro - Public40.90
SWE-bench Multilingual—
Free commercial
12
Kimi K2 Thinking
Moonshot AI
SWE-bench Verified71.30
LiveCodeBench83.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
13
MiniMax M2
MiniMaxAI
SWE-bench Verified69.40
LiveCodeBench83.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
14
Gemini 2.5 Pro Deep Think
Google Deep Mind
SWE-bench Verified—
LiveCodeBench80.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
15
DeepSeek-V3.1 Terminus
DeepSeek-AI
SWE-bench Verified68.40
LiveCodeBench80.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
16
Grok-3 - Reasoning Beta
xAI
SWE-bench Verified—
LiveCodeBench79.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
17
Gemini-2.5-Pro-Preview-05-06
Google Deep Mind
SWE-bench Verified63.20
LiveCodeBench77.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
18
DeepSeek-V3.1
DeepSeek-AI
SWE-bench Verified66.00
LiveCodeBench74.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
19
DeepSeek V3.2-Exp
DeepSeek-AI
SWE-bench Verified67.80
LiveCodeBench74.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
20
Qwen3-235B-A22B-Thinking-2507
阿里巴巴
SWE-bench Verified—
LiveCodeBench74.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
21
Kimi-k1.6-IOI-high
Moonshot AI
SWE-bench Verified—
LiveCodeBench73.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
22
DeepSeek-R1-0528
DeepSeek-AI
SWE-bench Verified57.60
LiveCodeBench73.30
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
23
GLM-4.5
智谱AI
SWE-bench Verified64.20
LiveCodeBench72.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
24
OpenAI o1
OpenAI
SWE-bench Verified48.90
LiveCodeBench71.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
25
GLM-4.5-Air
智谱AI
SWE-bench Verified57.60
LiveCodeBench70.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
26
Qwen3-235B-A22B
阿里巴巴
SWE-bench Verified34.40
LiveCodeBench70.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
27
Grok 3
xAI
SWE-bench Verified—
LiveCodeBench70.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
28
OpenAI o3-mini (high)
OpenAI
SWE-bench Verified49.30
LiveCodeBench69.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
29
OpenAI o3-mini (medium)
OpenAI
SWE-bench Verified—
LiveCodeBench67.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
30
Step3
StepFunAI
SWE-bench Verified—
LiveCodeBench67.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
31
DeepSeek-R1
DeepSeek-AI
SWE-bench Verified49.20
LiveCodeBench65.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
32
Kimi-k1.6-IOI
Moonshot AI
SWE-bench Verified—
LiveCodeBench65.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
33
QwQ-Max-Preview
阿里巴巴
SWE-bench Verified—
LiveCodeBench65.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
34
MiniMax-M1-80k
MiniMaxAI
SWE-bench Verified56.00
LiveCodeBench65.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
35
MiniMax-M1-40k
MiniMaxAI
SWE-bench Verified55.60
LiveCodeBench62.30
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
36
Magistral-Medium-2506
MistralAI
SWE-bench Verified—
LiveCodeBench59.36
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
37
Claude Opus 4
Anthropic
SWE-bench Verified72.50
LiveCodeBench56.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
38
Gemini 2.5 Flash
Google Deep Mind
SWE-bench Verified50.00
LiveCodeBench55.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
39
Kimi K2
Moonshot AI
SWE-bench Verified51.80
LiveCodeBench53.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
40
OpenAI o1-mini
OpenAI
SWE-bench Verified—
LiveCodeBench52.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
41
Qwen3-235B-A22B-2507
阿里巴巴
SWE-bench Verified—
LiveCodeBench51.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
42
Llama 4 Behemoth Instruct
Facebook AI研究实验室
SWE-bench Verified—
LiveCodeBench49.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
43
DeepSeek-V3-0324
DeepSeek-AI
SWE-bench Verified38.80
LiveCodeBench49.20
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
44
GPT-4.5
OpenAI
SWE-bench Verified38.00
LiveCodeBench46.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
45
Llama 4 Maverick Instruct
Facebook AI研究实验室
SWE-bench Verified—
LiveCodeBench43.40
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
46
GPT-4.1
OpenAI
SWE-bench Verified54.60
LiveCodeBench40.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
47
ERNIE-4.5-VL-424B-A47B-Base
百度
SWE-bench Verified—
LiveCodeBench38.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
48
ERNIE-4.5-300B-A47B
百度
SWE-bench Verified—
LiveCodeBench38.80
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
49
Codestral 25.01
MistralAI
SWE-bench Verified—
LiveCodeBench37.90
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Proprietary
50
DeepSeek-V3
DeepSeek-AI
SWE-bench Verified—
LiveCodeBench34.60
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
Sort by:
Showing 50 of 67 modelsView LiveCodeBench benchmark page
Source: LMArena

DesignArena Code Category

Full ranking

Elo ratings from anonymous voting on visual front-end code tasks (websites, UI components, games, data viz) by Arcada Labs.

Updated 2026-05-17

#ModelElo
1
Anthropic
Claude Opus 4.6Anthropic
1348
2
Anthropic
Opus 4.7 (thinking)Anthropic
1345
3
Anthropic
Claude Opus 4.6 (thinking)Anthropic
1344
4
Moonshot AI
Kimi K2.6Moonshot AI
1343
5
智
GLM 5.1智谱AI
1338
6
Anthropic
Opus 4.7Anthropic
1335
7
智
GLM-5-Turbo智谱AI
1334
8
Anthropic
Claude Sonnet 4.6Anthropic
1331
9
X
MiMo-V2.5-ProXiaomi
1329
10
OpenAI
GPT-5.5OpenAI
1320
Source: DesignArena
34B
65B
100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff: