LLM Coding Benchmark Leaderboard

Name: LLM Coding Benchmark Leaderboard
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

This page provides the LLM coding benchmark leaderboard, covering SWE-Bench Verified, SWE-Bench Pro, LiveCodeBench, and SWE-bench Multilingual datasets, comparing GPT, Claude, Qwen, and DeepSeek models.

Updated on 2026-07-18 08:01:52

As of 2026-07, this page covers SWE-bench Verified, LiveCodeBench, SWE-Bench Pro - Public, SWE-bench Multilingual and related benchmarks for LLM Coding Benchmark Leaderboard, making it straightforward to compare within the same task family.

Click any model name to check context length, licensing, and pricing on its detail page. See Data Methodology for scoring details.

Reference: Composite Coding Rankings

There is no single, universally accepted coding leaderboard. Static benchmarks like SWE-bench and HumanEval measure specific skills but can be gamed through targeted fine-tuning. We selected two complementary human-preference leaderboards: LMArena Coding Arena ranks models on general programming tasks (debugging, algorithms, code generation) via anonymous crowd-sourced voting; DesignArena Code Category focuses specifically on visual, front-end code generation (websites, UI components, games) using the same blind-voting methodology. Reading both together gives a fuller picture of coding capability.

LMArena Coding Arena

Full ranking

Elo ratings from anonymous A/B voting on real general coding tasks (debugging, algorithms, code generation) submitted by developers.

Updated 2026-07-10

#ModelElo

Claude Fable 5Anthropic

1564

Opus 4.7 (thinking)Anthropic

1553

Claude Opus 4.6 (thinking)Anthropic

1550

Opus 4.7Anthropic

1550

Claude Opus 4.6Anthropic

1547

Claude Opus 4.8 (thinking)Anthropic

1537

Claude Opus 4.8Anthropic

1533

muse-spark-1.1Meta

1530

Claude Opus 4 (thinking-32k)Anthropic

1530

gpt-5.6-sol-xhighOpenAI

1528

Source: LMArena

DesignArena Code Category

Full ranking

Elo ratings from anonymous voting on visual front-end code tasks (websites, UI components, games, data viz) by Arcada Labs.

Updated 2026-07-12

#ModelElo

GLM 5.2Zhipu AI

1352

GPT-5.6 SolOpenAI

1350

Claude Fable 5Anthropic

1344

Claude Opus 4.6Anthropic

1336

Claude Opus 4.6 (thinking)Anthropic

1330

Opus 4.7Anthropic

1330

Grok 4.5xAI

1328

智

GLM 5.1智谱AI

1321

Kimi K2.6Moonshot AI

1320

Claude Sonnet 4.6Anthropic

1319

Source: DesignArena

Benchmark

SWE-bench Verified LiveCodeBench SWE-Bench Pro - Public SWE-bench Multilingual

More Benchmarks

Model Size:All 3B and below 7B 13B 34B 65B 100B and above

Model Type:All Reasoning Models Foundation Models Instruction/Chat Models Coding Models

Source:All Open Source Closed Source

Origin:All China

Model release cutoff:

Top picks

Ranked by SWE-bench Verified

Current SOTA

Qwen3.6-27B

阿里巴巴

77.20SWE-bench Verified

View model

Best Open-Source

Qwen3.6-27B

阿里巴巴

77.20SWE-bench Verified

View model

Best China-Made

Qwen3.6-27B

阿里巴巴

77.20SWE-bench Verified

View model

LLM Performance Results

Data source: DataLearnerAI

Click any row to open the model page. Tick the checkboxes to compare up to 4 models side by side.

Rank	Model					License
	Qwen3.6-27B 阿里巴巴	77.20	83.90	53.50	71.30	Free commercial	Details
	Qwen3.5-397B-A17B 阿里巴巴	76.40	83.60	50.90	69.30	Free commercial	Details
	Qwen3.6-35B-A3B 阿里巴巴	73.40	80.40	49.50	67.20	Free commercial	Details
4	Qwen3.5-27B 阿里巴巴	72.40	80.70	—	—	Free commercial	Details
5	GLM-4.7-Flash 智谱AI	59.20	—	—	—	Free commercial	Details
6	Devstral Small 1.1 MistralAI	53.60	—	—	—	Free commercial	Details
7	Qwen3-Coder-Flash 阿里巴巴	51.60	—	—	—	Free commercial	Details
8	Devstral Small 1.0 MistralAI	46.80	—	—	—	Free commercial	Details
9	GPT OSS 20B OpenAI	34.00	—	—	—	Free commercial	Details
10	Qwen3-30B-A3B-2507 阿里巴巴	22.00	43.20	—	—	Free commercial	Details
11	Gemma 4 31B DeepMind	—	80.00	—	—	Free commercial	Details
12	Gemma 4 26B A4B DeepMind	—	77.10	—	—	Free commercial	Details
13	Qwen3-235B-A22B-Thinking 阿里巴巴	—	74.10	—	—	Free commercial	Details
14	Qwen3-32B 阿里巴巴	—	65.70	—	—	Free commercial	Details
15	Magistral-Small-2506 MistralAI	—	55.84	—	—	Free commercial	Details
16	Qwen2.5-32B 阿里巴巴	—	51.20	—	—	Free commercial	Details
17	Codestral MistralAI	—	31.50	—	—	Non-commercial	Details
18	Gemma 3 - 27B (IT) Google Deep Mind	—	29.70	—	—	Free commercial	Details
19	Qwen3-30B-A3B 阿里巴巴	—	29.00	—	—	Free commercial	Details