Gemini 3.0 FlashvsGemini 2.0 Flash Experimental
Across 5 shared benchmarks, Gemini 3.0 Flash leads overall: Gemini 3.0 Flash wins 5, Gemini 2.0 Flash Experimental wins 0, with 0 ties and an average score difference of +43.94.
Gemini 3.0 Flash
Google Deep Mind · 2025-12-17 · Chat model
Gemini 2.0 Flash Experimental
DeepMind · 2024-12-11 · Multimodal model
Gemini 3.0 Flash5 wins(100%)(0%)0 winsGemini 2.0 Flash Experimental
Benchmark scores
Grouped by capability, sorted by largest gap within each. 5 shared benchmarks.
General Knowledge
Gemini 3.0 Flash 2/2| Benchmark | Gemini 3.0 Flash | Gemini 2.0 Flash Experimental | Diff |
|---|---|---|---|
| HLE | 43.5040 / 161 | 5.10156 / 161 | +38.40 |
| GPQA Diamond | 90.4018 / 179 | 65.20130 / 179 | +25.20 |
Coding and Software Engineer
Gemini 3.0 Flash 1/1| Benchmark | Gemini 3.0 Flash | Gemini 2.0 Flash Experimental | Diff |
|---|---|---|---|
| SWE-bench Verified | 68.7062 / 108 | 21.40108 / 108 | +47.30 |
Common Sense
Gemini 3.0 Flash 1/1| Benchmark | Gemini 3.0 Flash | Gemini 2.0 Flash Experimental | Diff |
|---|---|---|---|
| SimpleQA | 68.707 / 45 | 29.9023 / 45 | +38.80 |
Math and Reasoning
Gemini 3.0 Flash 1/1| Benchmark | Gemini 3.0 Flash | Gemini 2.0 Flash Experimental | Diff |
|---|---|---|---|
| AIME2025 | 99.708 / 106 | 29.70100 / 106 | +70 |
Specs
| Field | Gemini 3.0 Flash | Gemini 2.0 Flash Experimental |
|---|---|---|
| Publisher | Google Deep Mind | DeepMind |
| Release date | 2025-12-17 | 2024-12-11 |
| Model type | Chat model | Multimodal model |
| Architecture | Dense | Dense |
| Parameters | Not available | Not available |
| Context length | 2000K | 1000K |
| Max output | 64K | Not available |
Summary
- Gemini 3.0 Flashleads in:General Knowledge (2/2), Coding and Software Engineer (1/1), Common Sense (1/1), Math and Reasoning (1/1)
On average across the 5 shared benchmarks, Gemini 3.0 Flash scores 43.94 higher.
Largest single-benchmark gap: AIME2025 — Gemini 3.0 Flash 99.70 vs Gemini 2.0 Flash Experimental 29.70 (+70).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.