Qwen3.6-27BvsHaiku 4.5
Across 7 shared benchmarks, Qwen3.6-27B leads overall: Qwen3.6-27B wins 6, Haiku 4.5 wins 1, with 0 ties and an average score difference of +14.82.
Qwen3.6-27B
阿里巴巴 · 2026-04-22 · Reasoning model
Haiku 4.5
Anthropic · 2025-10-15 · Multimodal model
Qwen3.6-27B6 wins(86%)(14%)1 winHaiku 4.5
Benchmark scores
Grouped by capability, sorted by largest gap within each. 7 shared benchmarks.
Coding and Software Engineer
Qwen3.6-27B 3/3| Benchmark | Qwen3.6-27B | Haiku 4.5 | Diff |
|---|---|---|---|
| LiveCodeBench | 83.9019 / 120Thinking (No Tools) | 5191 / 120Normal (No Tools) | +32.90 |
| SWE-bench Verified | 77.2025 / 108Thinking (With Tools) | 60.6076 / 108Normal (With Tools) | +16.60 |
| SWE-Bench Pro - Public | 53.5024 / 43Thinking (With Tools) | 39.4540 / 43Extended (with tools) | +14.05 |
General Knowledge
Qwen3.6-27B 3/3| Benchmark | Qwen3.6-27B | Haiku 4.5 | Diff |
|---|---|---|---|
| GPQA Diamond | 87.8033 / 178Thinking (No Tools) | 60.50138 / 178Normal (No Tools) | +27.30 |
| HLE | 2492 / 157Thinking (No Tools) | 4.30155 / 157Normal (No Tools) | +19.70 |
| MMLU Pro | 86.2016 / 126Thinking (No Tools) | 7678 / 126Normal (No Tools) | +10.20 |
Claw-style Agent Evaluation
Haiku 4.5 1/1| Benchmark | Qwen3.6-27B | Haiku 4.5 | Diff |
|---|---|---|---|
| Claw Bench | 72.4027 / 29Thinking (With Tools) | 89.4011 / 29Thinking (With Tools) | -17 |
Specs
| Field | Qwen3.6-27B | Haiku 4.5 |
|---|---|---|
| Publisher | 阿里巴巴 | Anthropic |
| Release date | 2026-04-22 | 2025-10-15 |
| Model type | Reasoning model | Multimodal model |
| Architecture | Dense | Dense |
| Parameters | 27B | Not available |
| Context length | 128K | 200K |
| Max output | 16K | 64K |
Summary
- Qwen3.6-27Bleads in:Coding and Software Engineer (3/3), General Knowledge (3/3)
- Haiku 4.5leads in:Claw-style Agent Evaluation (1/1)
On average across the 7 shared benchmarks, Qwen3.6-27B scores 14.82 higher.
Largest single-benchmark gap: LiveCodeBench — Qwen3.6-27B 83.90 vs Haiku 4.5 51 (+32.90).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.