Gemini 3.1 Pro PreviewvsGemini 2.5 Pro Experimental 03-25

Across 6 shared benchmarks, Gemini 3.1 Pro Preview leads overall: Gemini 3.1 Pro Preview wins 6, Gemini 2.5 Pro Experimental 03-25 wins 0, with 0 ties and an average score difference of +18.05.

Google Deep Mind
Gemini 3.1 Pro Preview

Google Deep Mind · 2026-02-20 · Multimodal model

Google Deep Mind
Gemini 2.5 Pro Experimental 03-25

Google Deep Mind · 2025-03-25 · Reasoning model

Gemini 3.1 Pro Preview6 wins(100%)(0%)0 winsGemini 2.5 Pro Experimental 03-25

Benchmark scores

Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.

Coding and Software Engineer

Gemini 3.1 Pro Preview 2/2
BenchmarkGemini 3.1 Pro PreviewGemini 2.5 Pro Experimental 03-25Diff
LiveCodeBench91.703 / 120Thinking High (With Tools)70.4053 / 120+21.30
SWE-bench Verified80.6010 / 108Thinking High (With Tools)63.8072 / 108+16.80

General Knowledge

Gemini 3.1 Pro Preview 2/2
BenchmarkGemini 3.1 Pro PreviewGemini 2.5 Pro Experimental 03-25Diff
HLE51.4015 / 157Thinking High (With Tools)18.80108 / 157+32.60
GPQA Diamond94.303 / 178Thinking High (No Tools)8454 / 178+10.30

Claw-style Agent Evaluation

Gemini 3.1 Pro Preview 1/1
BenchmarkGemini 3.1 Pro PreviewGemini 2.5 Pro Experimental 03-25Diff
Pinch Bench86.7010 / 37Thinking (With Tools)71.9029 / 37Thinking (With Tools)+14.80

Math and Reasoning

Gemini 3.1 Pro Preview 1/1
BenchmarkGemini 3.1 Pro PreviewGemini 2.5 Pro Experimental 03-25Diff
FrontierMath - Tier 416.7020 / 80Normal (No Tools)4.2040 / 80Normal (No Tools)+12.50

Specs

FieldGemini 3.1 Pro PreviewGemini 2.5 Pro Experimental 03-25
PublisherGoogle Deep MindGoogle Deep Mind
Release date2026-02-202025-03-25
Model typeMultimodal modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1M2000K
Max output32K64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemGemini 3.1 Pro PreviewGemini 2.5 Pro Experimental 03-25
Text input$2 / 1M tokensNot public
Text output$12 / 1M tokensNot public

One or both models have incomplete public pricing.

Summary

  • Gemini 3.1 Pro Previewleads in:Coding and Software Engineer (2/2), General Knowledge (2/2), Claw-style Agent Evaluation (1/1), Math and Reasoning (1/1)

On average across the 6 shared benchmarks, Gemini 3.1 Pro Preview scores 18.05 higher.

Largest single-benchmark gap: HLE — Gemini 3.1 Pro Preview 51.40 vs Gemini 2.5 Pro Experimental 03-25 18.80 (+32.60).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.