Gemini 3.0 FlashvsClaude Sonnet 4
Across 11 shared benchmarks, Gemini 3.0 Flash leads overall: Gemini 3.0 Flash wins 10, Claude Sonnet 4 wins 1, with 0 ties and an average score difference of +12.61.
Gemini 3.0 Flash
Google Deep Mind · 2025-12-17 · Chat model
Claude Sonnet 4
Anthropic · 2025-05-23 · Reasoning model
Gemini 3.0 Flash10 wins(91%)(9%)1 winClaude Sonnet 4
Benchmark scores
Grouped by capability, sorted by largest gap within each. 11 shared benchmarks.
General Knowledge
Gemini 3.0 Flash 4/4| Benchmark | Gemini 3.0 Flash | Claude Sonnet 4 | Diff |
|---|---|---|---|
| HLE | 43.5040 / 161 | 9.60138 / 161 | +33.90 |
| ARC-AGI-2 | 33.6027 / 59 | 5.9043 / 59 | +27.70 |
| GPQA Diamond | 90.4018 / 179 | 83.8058 / 179 | +6.60 |
| LiveBench | 56.3579 / 115Normal (No Tools) | 50.9889 / 115Normal (No Tools) | +5.37 |
Claw-style Agent Evaluation
Gemini 3.0 Flash 2/2| Benchmark | Gemini 3.0 Flash | Claude Sonnet 4 | Diff |
|---|---|---|---|
| Claw Bench | 85.7015 / 29Thinking (With Tools) | 77.8023 / 29Thinking (With Tools) | +7.90 |
| Pinch Bench | 85.2016 / 37Thinking (With Tools) | 80.5022 / 37Thinking (With Tools) | +4.70 |
Coding and Software Engineer
Even 2/2| Benchmark | Gemini 3.0 Flash | Claude Sonnet 4 | Diff |
|---|---|---|---|
| SWE-bench Verified | 68.7062 / 108 | 80.2013 / 108 | -11.50 |
| SWE-Bench Pro - Public | 49.6033 / 44Thinking High (With Tools) | 42.7038 / 44 | +6.90 |
Math and Reasoning
Gemini 3.0 Flash 2/2| Benchmark | Gemini 3.0 Flash | Claude Sonnet 4 | Diff |
|---|---|---|---|
| AIME2025 | 99.708 / 106 | 8550 / 106 | +14.70 |
| FrontierMath - Tier 4 | 4.2040 / 80Normal (No Tools) | 072 / 80Normal (No Tools) | +4.20 |
Agent Level Benchmark
Gemini 3.0 Flash 1/1| Benchmark | Gemini 3.0 Flash | Claude Sonnet 4 | Diff |
|---|---|---|---|
| τ²-Bench | 90.203 / 40 | 5233 / 40 | +38.20 |
Specs
| Field | Gemini 3.0 Flash | Claude Sonnet 4 |
|---|---|---|
| Publisher | Google Deep Mind | Anthropic |
| Release date | 2025-12-17 | 2025-05-23 |
| Model type | Chat model | Reasoning model |
| Architecture | Dense | Dense |
| Parameters | Not available | Not available |
| Context length | 2000K | 200K |
| Max output | 64K | 64K |
Summary
- Gemini 3.0 Flashleads in:General Knowledge (4/4), Claw-style Agent Evaluation (2/2), Math and Reasoning (2/2), Agent Level Benchmark (1/1)
- Tied in:Coding and Software Engineer
On average across the 11 shared benchmarks, Gemini 3.0 Flash scores 12.61 higher.
Largest single-benchmark gap: τ²-Bench — Gemini 3.0 Flash 90.20 vs Claude Sonnet 4 52 (+38.20).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.