Gemini 3.0 FlashvsClaude Sonnet 4

Across 11 shared benchmarks, Gemini 3.0 Flash leads overall: Gemini 3.0 Flash wins 10, Claude Sonnet 4 wins 1, with 0 ties and an average score difference of +12.61.

Google Deep Mind
Gemini 3.0 Flash

Google Deep Mind · 2025-12-17 · Chat model

Anthropic
Claude Sonnet 4

Anthropic · 2025-05-23 · Reasoning model

Gemini 3.0 Flash10 wins(91%)(9%)1 winClaude Sonnet 4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 11 shared benchmarks.

General Knowledge

Gemini 3.0 Flash 4/4
BenchmarkGemini 3.0 FlashClaude Sonnet 4Diff
HLE43.5040 / 1619.60138 / 161+33.90
ARC-AGI-233.6027 / 595.9043 / 59+27.70
GPQA Diamond90.4018 / 17983.8058 / 179+6.60
LiveBench56.3579 / 115Normal (No Tools)50.9889 / 115Normal (No Tools)+5.37

Claw-style Agent Evaluation

Gemini 3.0 Flash 2/2
BenchmarkGemini 3.0 FlashClaude Sonnet 4Diff
Claw Bench85.7015 / 29Thinking (With Tools)77.8023 / 29Thinking (With Tools)+7.90
Pinch Bench85.2016 / 37Thinking (With Tools)80.5022 / 37Thinking (With Tools)+4.70

Coding and Software Engineer

Even 2/2
BenchmarkGemini 3.0 FlashClaude Sonnet 4Diff
SWE-bench Verified68.7062 / 10880.2013 / 108-11.50
SWE-Bench Pro - Public49.6033 / 44Thinking High (With Tools)42.7038 / 44+6.90

Math and Reasoning

Gemini 3.0 Flash 2/2
BenchmarkGemini 3.0 FlashClaude Sonnet 4Diff
AIME202599.708 / 1068550 / 106+14.70
FrontierMath - Tier 44.2040 / 80Normal (No Tools)072 / 80Normal (No Tools)+4.20

Agent Level Benchmark

Gemini 3.0 Flash 1/1
BenchmarkGemini 3.0 FlashClaude Sonnet 4Diff
τ²-Bench90.203 / 405233 / 40+38.20

Specs

FieldGemini 3.0 FlashClaude Sonnet 4
PublisherGoogle Deep MindAnthropic
Release date2025-12-172025-05-23
Model typeChat modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length2000K200K
Max output64K64K

Summary

  • Gemini 3.0 Flashleads in:General Knowledge (4/4), Claw-style Agent Evaluation (2/2), Math and Reasoning (2/2), Agent Level Benchmark (1/1)
  • Tied in:Coding and Software Engineer

On average across the 11 shared benchmarks, Gemini 3.0 Flash scores 12.61 higher.

Largest single-benchmark gap: τ²-Bench — Gemini 3.0 Flash 90.20 vs Claude Sonnet 4 52 (+38.20).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.