Gemma 4 31BvsQwen3.5-27B

Across 5 shared benchmarks, Qwen3.5-27B leads overall: Gemma 4 31B wins 0, Qwen3.5-27B wins 5, with 0 ties and an average score difference of -5.38.

DeepMind · 2026-04-02 · AI model

阿里巴巴 · 2026-02-25 · Reasoning model

Gemma 4 31B0 wins(0%)(100%)5 winsQwen3.5-27B

Benchmark scores

Grouped by capability, sorted by largest gap within each. 5 shared benchmarks.

Qwen3.5-27B 3/3

Benchmark	Gemma 4 31B	Qwen3.5-27B	Diff
HLE	26.5075 / 149Thinking (With Tools + Internet)	48.5021 / 149Thinking (With Tools)	-22
GPQA Diamond	84.3050 / 175Thinking (No Tools)	85.5044 / 175Thinking (No Tools)	-1.20
MMLU Pro	85.2021 / 124Thinking (No Tools)	86.10

Qwen3.5-27Bleads in:General Knowledge (3/3), Agent Level Benchmark (1/1), Coding and Software Engineer (1/1)

On average across the 5 shared benchmarks, Qwen3.5-27B scores 5.38 higher.

Largest single-benchmark gap: HLE — Gemma 4 31B 26.50 vs Qwen3.5-27B 48.50 (-22).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.