Gemma 4 31BvsQwen3.5-27B
Across 5 shared benchmarks, Qwen3.5-27B leads overall: Gemma 4 31B wins 0, Qwen3.5-27B wins 5, with 0 ties and an average score difference of -5.38.
Gemma 4 31B
DeepMind · 2026-04-02 · AI model
Qwen3.5-27B
阿里巴巴 · 2026-02-25 · Reasoning model
Gemma 4 31B0 wins(0%)(100%)5 winsQwen3.5-27B
Benchmark scores
Grouped by capability, sorted by largest gap within each. 5 shared benchmarks.
General Knowledge
Qwen3.5-27B 3/3| Benchmark | Gemma 4 31B | Qwen3.5-27B | Diff |
|---|---|---|---|
| HLE | 26.5075 / 149Thinking (With Tools + Internet) | 48.5021 / 149Thinking (With Tools) | -22 |
| GPQA Diamond | 84.3050 / 175Thinking (No Tools) | 85.5044 / 175Thinking (No Tools) | -1.20 |
| MMLU Pro | 85.2021 / 124Thinking (No Tools) | 86.10 |
Specs
| Field | Gemma 4 31B | Qwen3.5-27B |
|---|---|---|
| Publisher | DeepMind | 阿里巴巴 |
| Release date | 2026-04-02 | 2026-02-25 |
| Model type | AI model | Reasoning model |
| Architecture | Dense | Dense |
| Parameters | 31.0 | 270.0 |
| Context length | 256K | 1010K |
| Max output | 32768 | 248320 |
Summary
- Qwen3.5-27Bleads in:General Knowledge (3/3), Agent Level Benchmark (1/1), Coding and Software Engineer (1/1)
On average across the 5 shared benchmarks, Qwen3.5-27B scores 5.38 higher.
Largest single-benchmark gap: HLE — Gemma 4 31B 26.50 vs Qwen3.5-27B 48.50 (-22).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.