Gemini 3.0 FlashvsClaude Sonnet 4

Across 11 shared benchmarks, Gemini 3.0 Flash leads overall: Gemini 3.0 Flash wins 10, Claude Sonnet 4 wins 1, with 0 ties and an average score difference of +12.61.

Gemini 3.0 Flash

Google Deep Mind · 2025-12-17 · Chat model

Claude Sonnet 4

Anthropic · 2025-05-23 · Reasoning model

Gemini 3.0 Flash10 wins(91%)(9%)1 winClaude Sonnet 4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 11 shared benchmarks.

General Knowledge

Gemini 3.0 Flash 4/4

Benchmark	Gemini 3.0 Flash	Claude Sonnet 4	Diff
HLE	43.5040 / 161	9.60138 / 161	+33.90
ARC-AGI-2	33.6027 / 59	5.9043 / 59	+27.70
GPQA Diamond	90.4018 / 179	83.8058 / 179	+6.60
LiveBench	56.3579 / 115Normal (No Tools)	50.9889 / 115Normal (No Tools)	+5.37

Claw-style Agent Evaluation

Gemini 3.0 Flash 2/2

Benchmark	Gemini 3.0 Flash	Claude Sonnet 4	Diff
Claw Bench	85.7015 / 29Thinking (With Tools)	77.8023 / 29Thinking (With Tools)	+7.90
Pinch Bench	85.2016 / 37Thinking (With Tools)	80.5022 / 37Thinking (With Tools)	+4.70

Coding and Software Engineer

Even 2/2

Benchmark	Gemini 3.0 Flash	Claude Sonnet 4	Diff
SWE-bench Verified	68.7062 / 108	80.2013 / 108	-11.50
SWE-Bench Pro - Public	49.6033 / 44Thinking High (With Tools)	42.7038 / 44	+6.90

Math and Reasoning

Gemini 3.0 Flash 2/2

Benchmark	Gemini 3.0 Flash	Claude Sonnet 4	Diff
AIME2025	99.708 / 106	8550 / 106	+14.70
FrontierMath - Tier 4	4.2040 / 80Normal (No Tools)	072 / 80Normal (No Tools)	+4.20

Agent Level Benchmark

Gemini 3.0 Flash 1/1

Benchmark	Gemini 3.0 Flash	Claude Sonnet 4	Diff
τ²-Bench	90.203 / 40	5233 / 40	+38.20

Specs

Field	Gemini 3.0 Flash	Claude Sonnet 4
Publisher	Google Deep Mind	Anthropic
Release date	2025-12-17	2025-05-23
Model type	Chat model	Reasoning model
Architecture	Dense	Dense
Parameters	Not available	Not available
Context length	2000K	200K
Max output	64K	64K

Summary

Gemini 3.0 Flashleads in:General Knowledge (4/4), Claw-style Agent Evaluation (2/2), Math and Reasoning (2/2), Agent Level Benchmark (1/1)
Tied in:Coding and Software Engineer

On average across the 11 shared benchmarks, Gemini 3.0 Flash scores 12.61 higher.

Largest single-benchmark gap: τ²-Bench — Gemini 3.0 Flash 90.20 vs Claude Sonnet 4 52 (+38.20).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

Gemini 3.0 Flash details Claude Sonnet 4 details·Customize in compare tool