DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Back to Main Leaderboard

LLM Coding Benchmark Leaderboard

This page provides the LLM coding benchmark leaderboard, covering SWE-Bench Verified, SWE-Bench Pro, LiveCodeBench, and SWE-bench Multilingual datasets, comparing GPT, Claude, Qwen, and DeepSeek models.

Updated on 2026-05-21 22:14:17

As of 2026-05, this page covers SWE-bench Verified, LiveCodeBench, SWE-Bench Pro - Public, SWE-bench Multilingual and related benchmarks for LLM Coding Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Reference: Composite Coding Rankings

There is no single, universally accepted coding leaderboard. Static benchmarks like SWE-bench and HumanEval measure specific skills but can be gamed through targeted fine-tuning. We selected two complementary human-preference leaderboards: LMArena Coding Arena ranks models on general programming tasks (debugging, algorithms, code generation) via anonymous crowd-sourced voting; DesignArena Code Category focuses specifically on visual, front-end code generation (websites, UI components, games) using the same blind-voting methodology. Reading both together gives a fuller picture of coding capability.

LMArena Coding Arena

Full ranking

Elo ratings from anonymous A/B voting on real general coding tasks (debugging, algorithms, code generation) submitted by developers.

Updated 2026-05-14

#ModelElo
1
Anthropic
Opus 4.7 (thinking)Anthropic
1563
2
Anthropic
Opus 4.7Anthropic
1551
3
Anthropic
Claude Opus 4.6 (thinking)Anthropic
1550
4
Anthropic
Claude Opus 4.6Anthropic
1549
5
Anthropic
Claude Opus 4 (thinking-32k)Anthropic
1531
6
F
Muse SparkFacebook AI研究实验室
1530
7
OpenAI
GPT-5.4 (high)OpenAI
1527
8
智
GLM 5.1智谱AI
1527
9
Google Deep Mind
Gemini 3.1 Pro PreviewGoogle Deep Mind
1526
10
Anthropic
Claude Sonnet 4.6Anthropic
1522
Source: LMArena

DesignArena Code Category

Full ranking

Elo ratings from anonymous voting on visual front-end code tasks (websites, UI components, games, data viz) by Arcada Labs.

Updated 2026-05-17

#ModelElo
1
Anthropic
Claude Opus 4.6Anthropic
1348
2
Anthropic
Opus 4.7 (thinking)Anthropic
1345
3
Anthropic
Claude Opus 4.6 (thinking)Anthropic
1344
4
Moonshot AI
Kimi K2.6Moonshot AI
1343
5
智
GLM 5.1智谱AI
1338
6
Anthropic
Opus 4.7Anthropic
1335
7
智
GLM-5-Turbo智谱AI
1334
8
Anthropic
Claude Sonnet 4.6Anthropic
1331
9
X
MiMo-V2.5-ProXiaomi
1329
10
OpenAI
GPT-5.5OpenAI
1320
Source: DesignArena
Benchmark
SWE-bench VerifiedLiveCodeBenchSWE-Bench Pro - PublicSWE-bench Multilingual
More Benchmarks
Model Size:All3B and below7B13B34B65B100B and above
Model Type:AllReasoning ModelsFoundation ModelsInstruction/Chat ModelsCoding Models
Source:AllOpen SourceClosed Source
Origin:AllChina
Model release cutoff:

Top picks

Ranked by SWE-bench Verified
Current SOTA
阿里巴巴

Qwen3.6-27B

阿里巴巴

77.20SWE-bench Verified
View model
Best Open-Source
阿里巴巴

Qwen3.5-397B-A17B

阿里巴巴

76.40SWE-bench Verified−0.80
View model
Best China-Made
阿里巴巴

Qwen3.6-35B-A3B

阿里巴巴

73.40SWE-bench Verified−3.80
View model

LLM Performance Results

Data source: DataLearnerAI

Click any row to open the model page. Tick the checkboxes to compare up to 4 models side by side.

RankModel
License
阿里巴巴
Qwen3.6-27B
阿里巴巴
77.2083.9053.5071.30Free commercialDetailsDetails
阿里巴巴
Qwen3.5-397B-A17B
阿里巴巴
76.4083.6050.9069.30Free commercialDetailsDetails
阿里巴巴
Qwen3.6-35B-A3B
阿里巴巴
73.4080.4049.5067.20Free commercialDetailsDetails
4
阿里巴巴
Qwen3.5-27B
阿里巴巴
72.4080.70——Free commercialDetailsDetails
5
智谱AI
GLM-4.7-Flash
智谱AI
59.20———Free commercialDetailsDetails
6
MistralAI
Devstral Small 1.1
MistralAI
53.60———Free commercialDetailsDetails
7
阿里巴巴
Qwen3-Coder-Flash
阿里巴巴
51.60———Free commercialDetailsDetails
8
MistralAI
Devstral Small 1.0
MistralAI
46.80———Free commercialDetailsDetails
9
OpenAI
GPT OSS 20B
OpenAI
34.00———Free commercialDetailsDetails
10
阿里巴巴
Qwen3-30B-A3B-2507
阿里巴巴
22.0043.20——Free commercialDetailsDetails
11
阿里巴巴
Qwen3-235B-A22B-Thinking
阿里巴巴
—74.10——Free commercialDetailsDetails
12
阿里巴巴
Qwen3-32B
阿里巴巴
—65.70——Free commercialDetailsDetails
13
MistralAI
Magistral-Small-2506
MistralAI
—55.84——Free commercialDetailsDetails
14
阿里巴巴
Qwen2.5-32B
阿里巴巴
—51.20——Free commercialDetailsDetails
15
MistralAI
Codestral
MistralAI
—31.50——Non-commercialDetailsDetails
16
Google Deep Mind
Gemma 3 - 27B (IT)
Google Deep Mind
—29.70——Free commercialDetailsDetails
17
阿里巴巴
Qwen3-30B-A3B
阿里巴巴
—29.00——Free commercialDetailsDetails
Qwen3.6-27B
阿里巴巴
SWE-bench Verified77.20
LiveCodeBench83.90
SWE-Bench Pro - Public53.50
SWE-bench Multilingual71.30
Free commercial
Qwen3.5-397B-A17B
阿里巴巴
SWE-bench Verified76.40
LiveCodeBench83.60
SWE-Bench Pro - Public50.90
SWE-bench Multilingual69.30
Free commercial
Qwen3.6-35B-A3B
阿里巴巴
SWE-bench Verified73.40
LiveCodeBench80.40
SWE-Bench Pro - Public49.50
SWE-bench Multilingual67.20
Free commercial
4
Qwen3.5-27B
阿里巴巴
SWE-bench Verified72.40
LiveCodeBench80.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
5
GLM-4.7-Flash
智谱AI
SWE-bench Verified59.20
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
6
Devstral Small 1.1
MistralAI
SWE-bench Verified53.60
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
7
Qwen3-Coder-Flash
阿里巴巴
SWE-bench Verified51.60
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
8
Devstral Small 1.0
MistralAI
SWE-bench Verified46.80
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
9
GPT OSS 20B
OpenAI
SWE-bench Verified34.00
LiveCodeBench—
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
10
Qwen3-30B-A3B-2507
阿里巴巴
SWE-bench Verified22.00
LiveCodeBench43.20
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
11
Qwen3-235B-A22B-Thinking
阿里巴巴
SWE-bench Verified—
LiveCodeBench74.10
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
12
Qwen3-32B
阿里巴巴
SWE-bench Verified—
LiveCodeBench65.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
13
Magistral-Small-2506
MistralAI
SWE-bench Verified—
LiveCodeBench55.84
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
14
Qwen2.5-32B
阿里巴巴
SWE-bench Verified—
LiveCodeBench51.20
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
15
Codestral
MistralAI
SWE-bench Verified—
LiveCodeBench31.50
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Non-commercial
16
Gemma 3 - 27B (IT)
Google Deep Mind
SWE-bench Verified—
LiveCodeBench29.70
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
17
Qwen3-30B-A3B
阿里巴巴
SWE-bench Verified—
LiveCodeBench29.00
SWE-Bench Pro - Public—
SWE-bench Multilingual—
Free commercial
Sort by: