Gemini 3.0 FlashvsHaiku 4.5

Across 10 shared benchmarks, Gemini 3.0 Flash leads overall: Gemini 3.0 Flash wins 9, Haiku 4.5 wins 1, with 0 ties and an average score difference of +23.91.

Google Deep Mind
Gemini 3.0 Flash

Google Deep Mind · 2025-12-17 · Chat model

Anthropic
Haiku 4.5

Anthropic · 2025-10-15 · Multimodal model

Gemini 3.0 Flash9 wins(90%)(10%)1 winHaiku 4.5

Benchmark scores

Grouped by capability, sorted by largest gap within each. 10 shared benchmarks.

General Knowledge

Gemini 3.0 Flash 3/3
BenchmarkGemini 3.0 FlashHaiku 4.5Diff
HLE43.5038 / 1574.30155 / 157Normal (No Tools)+39.20
ARC-AGI-233.6027 / 591.3052 / 59Normal (No Tools)+32.30
GPQA Diamond90.4017 / 17860.50138 / 178Normal (No Tools)+29.90

Claw-style Agent Evaluation

Even 2/2
BenchmarkGemini 3.0 FlashHaiku 4.5Diff
Claw Bench85.7015 / 29Thinking (With Tools)89.4011 / 29Thinking (With Tools)-3.70
Pinch Bench85.2016 / 37Thinking (With Tools)8221 / 37Thinking (With Tools)+3.20

Coding and Software Engineer

Gemini 3.0 Flash 2/2
BenchmarkGemini 3.0 FlashHaiku 4.5Diff
SWE-Bench Pro - Public49.6032 / 43Thinking High (With Tools)39.4540 / 43Extended (with tools)+10.15
SWE-bench Verified68.7062 / 10860.6076 / 108Normal (With Tools)+8.10

Math and Reasoning

Gemini 3.0 Flash 2/2
BenchmarkGemini 3.0 FlashHaiku 4.5Diff
AIME202599.708 / 1063994 / 106Normal (No Tools)+60.70
FrontierMath - Tier 44.2040 / 80Normal (No Tools)2.1056 / 80Thinking (No Tools, 32K Budget)+2.10

Agent Level Benchmark

Gemini 3.0 Flash 1/1
BenchmarkGemini 3.0 FlashHaiku 4.5Diff
τ²-Bench90.203 / 403340 / 40Normal (With Tools)+57.20

Specs

FieldGemini 3.0 FlashHaiku 4.5
PublisherGoogle Deep MindAnthropic
Release date2025-12-172025-10-15
Model typeChat modelMultimodal model
ArchitectureDenseDense
ParametersNot availableNot available
Context length2000K200K
Max output64K64K

Summary

  • Gemini 3.0 Flashleads in:General Knowledge (3/3), Coding and Software Engineer (2/2), Math and Reasoning (2/2), Agent Level Benchmark (1/1)
  • Tied in:Claw-style Agent Evaluation

On average across the 10 shared benchmarks, Gemini 3.0 Flash scores 23.91 higher.

Largest single-benchmark gap: AIME2025 — Gemini 3.0 Flash 99.70 vs Haiku 4.5 39 (+60.70).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.