Qwen3.6-27BvsQwen3.5-27B

Across 8 shared benchmarks, Qwen3.6-27B leads overall: Qwen3.6-27B wins 6, Qwen3.5-27B wins 2, with 0 ties and an average score difference of +0.21.

阿里巴巴 · 2026-04-22 · Reasoning model

阿里巴巴 · 2026-02-25 · Reasoning model

Qwen3.6-27B6 wins(75%)(25%)2 winsQwen3.5-27B

Benchmark scores

Grouped by capability, sorted by largest gap within each. 8 shared benchmarks.

Qwen3.6-27B 3/4

Benchmark	Qwen3.6-27B	Qwen3.5-27B	Diff
HLE	24107 / 172Thinking (No Tools)	48.5033 / 172Thinking (With Tools)	-24.50
GPQA Diamond	87.8036 / 187Thinking (No Tools)	85.5052 / 187Thinking (No Tools)	+2.30
C-Eval	91.405 / 10Thinking (No Tools)	90.506 / 10Thinking (No Tools)	+0.90
MMLU Pro	86.2017 / 132Thinking (No Tools)	86.1019 / 132Thinking (No Tools)	+0.10

Qwen3.6-27B 2/2

Benchmark	Qwen3.6-27B	Qwen3.5-27B	Diff
SWE-bench Verified	77.2028 / 112Thinking (With Tools)	72.4053 / 112Thinking (No Tools)	+4.80
LiveCodeBench	83.9019 / 123Thinking (No Tools)	80.7027 / 123Thinking (With Tools)	+3.20

Qwen3.6-27B 1/1

Benchmark	Qwen3.6-27B	Qwen3.5-27B	Diff
Terminal Bench 2.0	59.3020 / 47Thinking (With Tools)	41.6043 / 47Thinking (With Tools)	+17.70

Qwen3.5-27B 1/1

Benchmark	Qwen3.6-27B	Qwen3.5-27B	Diff
Claw Bench	72.4027 / 29Thinking (With Tools)	75.2026 / 29Thinking (With Tools)	-2.80

Qwen3.6-27Bleads in:General Knowledge (3/4), Coding and Software Engineer (2/2), AI Agent - Tool Usage (1/1)
Qwen3.5-27Bleads in:Claw-style Agent Evaluation (1/1)

On average across the 8 shared benchmarks, Qwen3.6-27B scores 0.21 higher.

Largest single-benchmark gap: HLE — Qwen3.6-27B 24 vs Qwen3.5-27B 48.50 (-24.50).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.