Qwen3.6-27BvsHaiku 4.5

Across 8 shared benchmarks, Qwen3.6-27B leads overall: Qwen3.6-27B wins 7, Haiku 4.5 wins 1, with 0 ties and an average score difference of +15.50.

阿里巴巴 · 2026-04-22 · Reasoning model

Anthropic · 2025-10-15 · Multimodal model

Qwen3.6-27B7 wins(88%)(13%)1 winHaiku 4.5

Benchmark scores

Grouped by capability, sorted by largest gap within each. 8 shared benchmarks.

Qwen3.6-27B 4/4

Benchmark	Qwen3.6-27B	Haiku 4.5	Diff
GPQA Diamond	87.8036 / 187Thinking (No Tools)	60.50144 / 187Normal (No Tools)	+27.30
LiveBench	65.5652 / 115Normal (No Tools)	45.33103 / 115Normal (No Tools)	+20.23
HLE	24107 / 172Thinking (No Tools)	4.30170 / 172Normal (No Tools)	+19.70
MMLU Pro	86.2017 / 132Thinking (No Tools)	7681 / 132Normal (No Tools)	+10.20

Qwen3.6-27B 3/3

Benchmark	Qwen3.6-27B	Haiku 4.5	Diff
LiveCodeBench	83.9019 / 123Thinking (No Tools)	5193 / 123Normal (No Tools)	+32.90
SWE-bench Verified	77.2028 / 112Thinking (With Tools)	60.6080 / 112Normal (With Tools)	+16.60
SWE-Bench Pro - Public	53.5034 / 54Thinking (With Tools)	39.4551 / 54Extended (with tools)	+14.05

Haiku 4.5 1/1

Benchmark	Qwen3.6-27B	Haiku 4.5	Diff
Claw Bench	72.4027 / 29Thinking (With Tools)	89.4011 / 29Thinking (With Tools)	-17

Prices use DataLearner records when available; missing fields are not inferred.

One or both models have incomplete public pricing.

On average across the 8 shared benchmarks, Qwen3.6-27B scores 15.50 higher.

Largest single-benchmark gap: LiveCodeBench — Qwen3.6-27B 83.90 vs Haiku 4.5 51 (+32.90).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.