DataLearner logoDataLearnerAI
Latest AI Insights
Model Evaluations
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish

加载中...

DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Page navigation
目录
Model catalogGemma 4 31BBenchmark analysis

Gemma 4 31B Benchmark Details

Gemma 4 31B currently shows benchmark results led by MMLU Pro (16 / 115, score 85.20), LiveCodeBench (21 / 108, score 80), GPQA Diamond (39 / 162, score 84.30). This page also compares it with 3 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

Gemma 4 31B

Benchmark Results

Thinking
All modesThinking
Thinking mode details (1)
All thinking modesDefault (On)
Tool usage
All modesWith toolsNo tools
Internet
All modesOfflineInternet enabled

综合评估

4 evaluations
Benchmark / mode
Score
Rank/total
MMLU Pro
On
85.20
16 / 115
GPQA Diamond
On
84.30
39 / 162
HLE
On
19.50
71 / 119
HLE
OnToolsInternet
26.50
51 / 119

编程与软件工程

1 evaluations
Benchmark / mode
Score
Rank/total
LiveCodeBench
On
80
21 / 108

Agent能力评测

1 evaluations
Benchmark / mode
Score
Rank/total
τ²-Bench
OnTools
76.90
19 / 39

数学推理

1 evaluations
Benchmark / mode
Score
Rank/total
AIME 2026
On
89.20
9 / 9
Compare with other models

Competitor Comparison

Side-by-side benchmark comparison of Gemma 4 31B against leading peer models

Gemma 4 31B(Current model)GLM-5Kimi K2.5Qwen3.5-27B
Benchmark categories:

Benchmark Comparison Chart

Horizontal view (auto for dense data)

Gemma 4 31B:
On + Tool
On + Tool
On
GLM-5:
thinking
thinking + 使用工具 + Tool
Kimi K2.5:
思考模式
Qwen3.5-27B:
思考模式 + Tool
思考模式
Mode icons in chart labels:Thinking modeTool usage

Benchmark Score Comparison

6 benchmarks with comparable scores

BenchmarkGemma 4 31B(First)GLM-5Kimi K2.5Qwen3.5-27B
GPQA Diamond
综合评估
84.30
思考模式(无工具)
86.00
thinking
87.60
思考模式(无工具)
85.50
思考模式(无工具)
HLE
综合评估
26.50
思考模式(工具+联网)
50.40
thinking + 使用工具
30.10
思考模式(无工具)
48.50
思考模式(工具)
MMLU Pro
综合评估
85.20
思考模式(无工具)
--
78.50
思考模式(无工具)
86.10
思考模式(无工具)
LiveCodeBench
编程与软件工程
80.00
思考模式(无工具)
--
85.00
思考模式(无工具)
80.70
思考模式(工具)
τ²-Bench
Agent能力评测
76.90
思考模式(工具)
89.70
thinking + 使用工具
--
79.00
思考模式(工具)
AIME 2026
数学推理
89.20
思考模式(无工具)
92.70
thinking
92.50
思考模式(无工具)
--

Standard API Pricing: Gemma 4 31B vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

ModelSupplierStandard inputStandard outputBase price applies to
GLM-5
智谱AI$1 / 1M tokens$3.2 / 1M tokens—
Kimi K2.5
—0.6 美元/100 万tokens3 美元/100 万tokens—

Generational Comparison

Track the evolution of the Gemma 4 31B series across generations

Gemma 4 31B(Current model)Gemma 3 - 27B (IT)Gemma2-27B
Benchmark categories:

Benchmark Comparison Chart

Vertical view

Gemma 4 31B:
On
Gemma 3 - 27B (IT):
常规模式
Gemma2-27B:
normal
Mode icons in chart labels:Thinking modeTool usage

Benchmark Score Comparison

3 benchmarks with comparable scores

BenchmarkGemma 4 31B(First)Gemma 3 - 27B (IT)Gemma2-27B
GPQA Diamond
综合评估
84.30
思考模式(无工具)
42.40
常规模式(无工具)
--
MMLU Pro
综合评估
85.20
思考模式(无工具)
67.50
常规模式(无工具)
56.54
normal
LiveCodeBench
编程与软件工程
80.00
思考模式(无工具)
29.70
常规模式(无工具)
--

Standard API Pricing Across the Gemma 4 31B Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier.

Comparable standard text pricing is not available for these models.

Series Panorama · Beta

Top: multi-benchmark panorama. Bottom: single-benchmark mode relation with dotted links inside each generation.

Tip: click any score cell to switch the chart below.

Default view shows benchmarks with data coverage > 60% (3/3)

Benchmark
Gemma2-27B5/14/2024
Gemma 3 - 27B (IT)3/12/2025
Gemma 4 31B4/2/2026
综合评估
综合评估
编程与软件工程

Single-Benchmark Mode Relation

Viewing: GPQA Diamond · 综合评估

Benchmark
NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; dotted lines connect modes within the same generation.

References

blog.google