Gemma 4 31BvsGemma 3 - 27B (IT)

Across 3 shared benchmarks, Gemma 4 31B leads overall: Gemma 4 31B wins 3, Gemma 3 - 27B (IT) wins 0, with 0 ties and an average score difference of +36.63.

DeepMind
Gemma 4 31B

DeepMind · 2026-04-02 · Chat model

Google Deep Mind
Gemma 3 - 27B (IT)

Google Deep Mind · 2025-03-12 · Chat model

Gemma 4 31B3 wins(100%)(0%)0 winsGemma 3 - 27B (IT)

Benchmark scores

Grouped by capability, sorted by largest gap within each. 3 shared benchmarks.

General Knowledge

Gemma 4 31B 2/2
BenchmarkGemma 4 31BGemma 3 - 27B (IT)Diff
GPQA Diamond84.3053 / 178Thinking (No Tools)42.40161 / 178Normal (No Tools)+41.90
MMLU Pro85.2023 / 126Thinking (No Tools)67.5096 / 126Normal (No Tools)+17.70

Coding and Software Engineer

Gemma 4 31B 1/1
BenchmarkGemma 4 31BGemma 3 - 27B (IT)Diff
LiveCodeBench8030 / 120Thinking (No Tools)29.70116 / 120Normal (No Tools)+50.30

Specs

FieldGemma 4 31BGemma 3 - 27B (IT)
PublisherDeepMindGoogle Deep Mind
Release date2026-04-022025-03-12
Model typeChat modelChat model
ArchitectureDenseDense
Parameters3.1B27B
Context length256K128K
Max output32KNot available

Summary

  • Gemma 4 31Bleads in:General Knowledge (2/2), Coding and Software Engineer (1/1)

On average across the 3 shared benchmarks, Gemma 4 31B scores 36.63 higher.

Largest single-benchmark gap: LiveCodeBench — Gemma 4 31B 80 vs Gemma 3 - 27B (IT) 29.70 (+50.30).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.