Gemini 3.0 FlashvsGemini 2.0 Flash Experimental

Across 5 shared benchmarks, Gemini 3.0 Flash leads overall: Gemini 3.0 Flash wins 5, Gemini 2.0 Flash Experimental wins 0, with 0 ties and an average score difference of +43.94.

Google Deep Mind
Gemini 3.0 Flash

Google Deep Mind · 2025-12-17 · Chat model

DeepMind
Gemini 2.0 Flash Experimental

DeepMind · 2024-12-11 · Multimodal model

Gemini 3.0 Flash5 wins(100%)(0%)0 winsGemini 2.0 Flash Experimental

Benchmark scores

Grouped by capability, sorted by largest gap within each. 5 shared benchmarks.

General Knowledge

Gemini 3.0 Flash 2/2
BenchmarkGemini 3.0 FlashGemini 2.0 Flash ExperimentalDiff
HLE43.5040 / 1615.10156 / 161+38.40
GPQA Diamond90.4018 / 17965.20130 / 179+25.20

Coding and Software Engineer

Gemini 3.0 Flash 1/1
BenchmarkGemini 3.0 FlashGemini 2.0 Flash ExperimentalDiff
SWE-bench Verified68.7062 / 10821.40108 / 108+47.30

Common Sense

Gemini 3.0 Flash 1/1
BenchmarkGemini 3.0 FlashGemini 2.0 Flash ExperimentalDiff
SimpleQA68.707 / 4529.9023 / 45+38.80

Math and Reasoning

Gemini 3.0 Flash 1/1
BenchmarkGemini 3.0 FlashGemini 2.0 Flash ExperimentalDiff
AIME202599.708 / 10629.70100 / 106+70

Specs

FieldGemini 3.0 FlashGemini 2.0 Flash Experimental
PublisherGoogle Deep MindDeepMind
Release date2025-12-172024-12-11
Model typeChat modelMultimodal model
ArchitectureDenseDense
ParametersNot availableNot available
Context length2000K1000K
Max output64KNot available

Summary

  • Gemini 3.0 Flashleads in:General Knowledge (2/2), Coding and Software Engineer (1/1), Common Sense (1/1), Math and Reasoning (1/1)

On average across the 5 shared benchmarks, Gemini 3.0 Flash scores 43.94 higher.

Largest single-benchmark gap: AIME2025 — Gemini 3.0 Flash 99.70 vs Gemini 2.0 Flash Experimental 29.70 (+70).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.