Gemini 3.1 Pro PreviewvsGemini 2.5 Pro Experimental 03-25

Across 6 shared benchmarks, Gemini 3.1 Pro Preview leads overall: Gemini 3.1 Pro Preview wins 6, Gemini 2.5 Pro Experimental 03-25 wins 0, with 0 ties and an average score difference of +18.05.

Gemini 3.1 Pro Preview

Google Deep Mind · 2026-02-20 · Multimodal model

Gemini 2.5 Pro Experimental 03-25

Google Deep Mind · 2025-03-25 · Reasoning model

Gemini 3.1 Pro Preview6 wins(100%)(0%)0 winsGemini 2.5 Pro Experimental 03-25

Benchmark scores

Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.

Coding and Software Engineer

Gemini 3.1 Pro Preview 2/2

Benchmark	Gemini 3.1 Pro Preview	Gemini 2.5 Pro Experimental 03-25	Diff
LiveCodeBench	91.703 / 120Thinking High (With Tools)	70.4053 / 120	+21.30
SWE-bench Verified	80.6010 / 108Thinking High (With Tools)	63.8072 / 108	+16.80

General Knowledge

Gemini 3.1 Pro Preview 2/2

Benchmark	Gemini 3.1 Pro Preview	Gemini 2.5 Pro Experimental 03-25	Diff
HLE	51.4015 / 157Thinking High (With Tools)	18.80108 / 157	+32.60
GPQA Diamond	94.303 / 178Thinking High (No Tools)	8454 / 178	+10.30

Claw-style Agent Evaluation

Gemini 3.1 Pro Preview 1/1

Benchmark	Gemini 3.1 Pro Preview	Gemini 2.5 Pro Experimental 03-25	Diff
Pinch Bench	86.7010 / 37Thinking (With Tools)	71.9029 / 37Thinking (With Tools)	+14.80

Math and Reasoning

Gemini 3.1 Pro Preview 1/1

Benchmark	Gemini 3.1 Pro Preview	Gemini 2.5 Pro Experimental 03-25	Diff
FrontierMath - Tier 4	16.7020 / 80Normal (No Tools)	4.2040 / 80Normal (No Tools)	+12.50

Specs

Field	Gemini 3.1 Pro Preview	Gemini 2.5 Pro Experimental 03-25
Publisher	Google Deep Mind	Google Deep Mind
Release date	2026-02-20	2025-03-25
Model type	Multimodal model	Reasoning model
Architecture	Dense	Dense
Parameters	Not available	Not available
Context length	1M	2000K
Max output	32K	64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

Item	Gemini 3.1 Pro Preview	Gemini 2.5 Pro Experimental 03-25
Text input	$2 / 1M tokens	Not public
Text output	$12 / 1M tokens	Not public

One or both models have incomplete public pricing.

Summary

Gemini 3.1 Pro Previewleads in:Coding and Software Engineer (2/2), General Knowledge (2/2), Claw-style Agent Evaluation (1/1), Math and Reasoning (1/1)

On average across the 6 shared benchmarks, Gemini 3.1 Pro Preview scores 18.05 higher.

Largest single-benchmark gap: HLE — Gemini 3.1 Pro Preview 51.40 vs Gemini 2.5 Pro Experimental 03-25 18.80 (+32.60).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

Gemini 3.1 Pro Preview details Gemini 2.5 Pro Experimental 03-25 details·Customize in compare tool