加载中...
加载中...
Gemma 4 31B currently shows benchmark results led by MMLU Pro (16 / 115, score 85.20), LiveCodeBench (21 / 108, score 80), GPQA Diamond (39 / 162, score 84.30). This page also compares it with 3 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.
Side-by-side benchmark comparison of Gemma 4 31B against leading peer models
Horizontal view (auto for dense data)
6 benchmarks with comparable scores
| Benchmark | Gemma 4 31B(First) | GLM-5 | Kimi K2.5 | Qwen3.5-27B |
|---|---|---|---|---|
GPQA Diamond 综合评估 | 84.30 思考模式(无工具) | 86.00 thinking | 87.60 思考模式(无工具) | 85.50 思考模式(无工具) |
HLE 综合评估 | 26.50 思考模式(工具+联网) | 50.40 thinking + 使用工具 | 30.10 思考模式(无工具) | 48.50 思考模式(工具) |
MMLU Pro 综合评估 | 85.20 思考模式(无工具) | -- | 78.50 思考模式(无工具) | 86.10 思考模式(无工具) |
LiveCodeBench 编程与软件工程 | 80.00 思考模式(无工具) | -- | 85.00 思考模式(无工具) | 80.70 思考模式(工具) |
τ²-Bench Agent能力评测 | 76.90 思考模式(工具) | 89.70 thinking + 使用工具 | -- | 79.00 思考模式(工具) |
AIME 2026 数学推理 | 89.20 思考模式(无工具) | 92.70 thinking | 92.50 思考模式(无工具) | -- |
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GLM-5 | 智谱AI | $1 / 1M tokens | $3.2 / 1M tokens | — |
Kimi K2.5 | — | 0.6 美元/100 万tokens | 3 美元/100 万tokens | — |
Track the evolution of the Gemma 4 31B series across generations
Vertical view
3 benchmarks with comparable scores
| Benchmark | Gemma 4 31B(First) | Gemma 3 - 27B (IT) | Gemma2-27B |
|---|---|---|---|
GPQA Diamond 综合评估 | 84.30 思考模式(无工具) | 42.40 常规模式(无工具) | -- |
MMLU Pro 综合评估 | 85.20 思考模式(无工具) | 67.50 常规模式(无工具) | 56.54 normal |
LiveCodeBench 编程与软件工程 | 80.00 思考模式(无工具) | 29.70 常规模式(无工具) | -- |
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier.
Top: multi-benchmark panorama. Bottom: single-benchmark mode relation with dotted links inside each generation.
Tip: click any score cell to switch the chart below.
| Benchmark | Gemma2-27B5/14/2024 | Gemma 3 - 27B (IT)3/12/2025 | Gemma 4 31B4/2/2026 |
|---|---|---|---|
Viewing: GPQA Diamond · 综合评估