Gemini 3.0 FlashvsGemini 2.5 Flash
Across 8 shared benchmarks, Gemini 3.0 Flash leads overall: Gemini 3.0 Flash wins 7, Gemini 2.5 Flash wins 0, with 1 ties and an average score difference of +18.93.
Gemini 3.0 Flash
Google Deep Mind · 2025-12-17 · Chat model
Gemini 2.5 Flash
Google Deep Mind · 2025-04-17 · Reasoning model
Gemini 3.0 Flash7 wins(88%)Ties1(0%)0 winsGemini 2.5 Flash
Benchmark scores
Grouped by capability, sorted by largest gap within each. 8 shared benchmarks.
General Knowledge
Gemini 3.0 Flash 3/3| Benchmark | Gemini 3.0 Flash | Gemini 2.5 Flash | Diff |
|---|---|---|---|
| HLE | 43.5040 / 161 | 11131 / 161 | +32.50 |
| LiveBench | 56.3579 / 115Normal (No Tools) | 47.74101 / 115Thinking High (No Tools) | +8.61 |
| GPQA Diamond | 90.4018 / 179 | 82.8063 / 179 | +7.60 |
Math and Reasoning
Gemini 3.0 Flash 1/2| Benchmark | Gemini 3.0 Flash | Gemini 2.5 Flash | Diff |
|---|---|---|---|
| AIME2025 | 99.708 / 106 | 7270 / 106 | +27.70 |
| FrontierMath - Tier 4 | 4.2040 / 80Normal (No Tools) | 4.2040 / 80Normal (No Tools) | — |
Claw-style Agent Evaluation
Gemini 3.0 Flash 1/1| Benchmark | Gemini 3.0 Flash | Gemini 2.5 Flash | Diff |
|---|---|---|---|
| Pinch Bench | 85.2016 / 37Thinking (With Tools) | 70.7031 / 37Thinking (With Tools) | +14.50 |
Coding and Software Engineer
Gemini 3.0 Flash 1/1| Benchmark | Gemini 3.0 Flash | Gemini 2.5 Flash | Diff |
|---|---|---|---|
| SWE-bench Verified | 68.7062 / 108 | 5090 / 108 | +18.70 |
Common Sense
Gemini 3.0 Flash 1/1| Benchmark | Gemini 3.0 Flash | Gemini 2.5 Flash | Diff |
|---|---|---|---|
| SimpleQA | 68.707 / 45 | 26.9027 / 45 | +41.80 |
Specs
| Field | Gemini 3.0 Flash | Gemini 2.5 Flash |
|---|---|---|
| Publisher | Google Deep Mind | Google Deep Mind |
| Release date | 2025-12-17 | 2025-04-17 |
| Model type | Chat model | Reasoning model |
| Architecture | Dense | Dense |
| Parameters | Not available | Not available |
| Context length | 2000K | 1000K |
| Max output | 64K | 64K |
Summary
- Gemini 3.0 Flashleads in:General Knowledge (3/3), Math and Reasoning (1/2), Claw-style Agent Evaluation (1/1), Coding and Software Engineer (1/1), Common Sense (1/1)
On average across the 8 shared benchmarks, Gemini 3.0 Flash scores 18.93 higher.
Largest single-benchmark gap: SimpleQA — Gemini 3.0 Flash 68.70 vs Gemini 2.5 Flash 26.90 (+41.80).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.