Gemini 3.0 FlashvsHaiku 4.5
Across 10 shared benchmarks, Gemini 3.0 Flash leads overall: Gemini 3.0 Flash wins 9, Haiku 4.5 wins 1, with 0 ties and an average score difference of +23.91.
Gemini 3.0 Flash
Google Deep Mind · 2025-12-17 · Chat model
Haiku 4.5
Anthropic · 2025-10-15 · Multimodal model
Gemini 3.0 Flash9 wins(90%)(10%)1 winHaiku 4.5
Benchmark scores
Grouped by capability, sorted by largest gap within each. 10 shared benchmarks.
General Knowledge
Gemini 3.0 Flash 3/3| Benchmark | Gemini 3.0 Flash | Haiku 4.5 | Diff |
|---|---|---|---|
| HLE | 43.5038 / 157 | 4.30155 / 157Normal (No Tools) | +39.20 |
| ARC-AGI-2 | 33.6027 / 59 | 1.3052 / 59Normal (No Tools) | +32.30 |
| GPQA Diamond | 90.4017 / 178 | 60.50138 / 178Normal (No Tools) | +29.90 |
Claw-style Agent Evaluation
Even 2/2| Benchmark | Gemini 3.0 Flash | Haiku 4.5 | Diff |
|---|---|---|---|
| Claw Bench | 85.7015 / 29Thinking (With Tools) | 89.4011 / 29Thinking (With Tools) | -3.70 |
| Pinch Bench | 85.2016 / 37Thinking (With Tools) | 8221 / 37Thinking (With Tools) | +3.20 |
Coding and Software Engineer
Gemini 3.0 Flash 2/2| Benchmark | Gemini 3.0 Flash | Haiku 4.5 | Diff |
|---|---|---|---|
| SWE-Bench Pro - Public | 49.6032 / 43Thinking High (With Tools) | 39.4540 / 43Extended (with tools) | +10.15 |
| SWE-bench Verified | 68.7062 / 108 | 60.6076 / 108Normal (With Tools) | +8.10 |
Math and Reasoning
Gemini 3.0 Flash 2/2| Benchmark | Gemini 3.0 Flash | Haiku 4.5 | Diff |
|---|---|---|---|
| AIME2025 | 99.708 / 106 | 3994 / 106Normal (No Tools) | +60.70 |
| FrontierMath - Tier 4 | 4.2040 / 80Normal (No Tools) | 2.1056 / 80Thinking (No Tools, 32K Budget) | +2.10 |
Agent Level Benchmark
Gemini 3.0 Flash 1/1| Benchmark | Gemini 3.0 Flash | Haiku 4.5 | Diff |
|---|---|---|---|
| τ²-Bench | 90.203 / 40 | 3340 / 40Normal (With Tools) | +57.20 |
Specs
| Field | Gemini 3.0 Flash | Haiku 4.5 |
|---|---|---|
| Publisher | Google Deep Mind | Anthropic |
| Release date | 2025-12-17 | 2025-10-15 |
| Model type | Chat model | Multimodal model |
| Architecture | Dense | Dense |
| Parameters | Not available | Not available |
| Context length | 2000K | 200K |
| Max output | 64K | 64K |
Summary
- Gemini 3.0 Flashleads in:General Knowledge (3/3), Coding and Software Engineer (2/2), Math and Reasoning (2/2), Agent Level Benchmark (1/1)
- Tied in:Claw-style Agent Evaluation
On average across the 10 shared benchmarks, Gemini 3.0 Flash scores 23.91 higher.
Largest single-benchmark gap: AIME2025 — Gemini 3.0 Flash 99.70 vs Haiku 4.5 39 (+60.70).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.