Claude Sonnet 4.5vsGemini 2.5-Pro

Across 23 shared benchmarks, Claude Sonnet 4.5 leads overall: Claude Sonnet 4.5 wins 14, Gemini 2.5-Pro wins 7, with 2 ties and an average score difference of +6.21.

Anthropic
Claude Sonnet 4.5

Anthropic · 2025-09-30 · Chat model

Google Deep Mind
Gemini 2.5-Pro

Google Deep Mind · 2025-06-05 · Reasoning model

Claude Sonnet 4.514 wins(61%)Ties2(30%)7 winsGemini 2.5-Pro

Benchmark scores

Grouped by capability, sorted by largest gap within each. 23 shared benchmarks.

General Knowledge

Claude Sonnet 4.5 5/6
BenchmarkClaude Sonnet 4.5Gemini 2.5-ProDiff
ARC-AGI63.7032 / 653747 / 65+26.70
HLE33.6067 / 15721.6097 / 157+12
ARC-AGI-213.6035 / 594.9044 / 59+8.70
LiveBench78.264 / 5271.9213 / 52+6.34
GPQA Diamond83.4058 / 17886.4041 / 178-3
MMLU Pro887 / 1268620 / 126+2

Math and Reasoning

Gemini 2.5-Pro 4/6
BenchmarkClaude Sonnet 4.5Gemini 2.5-ProDiff
IMO-ProofBench27.108 / 1655.203 / 16-28.10
IMO-ProofBench Advanced4.806 / 817.604 / 8-12.80
AIME20251001 / 1068843 / 106+12
Simple Bench54.309 / 2762.402 / 27-8.10
FrontierMath5.2038 / 601123 / 60-5.80
FrontierMath - Tier 42.1056 / 80Normal (No Tools)2.1056 / 80Normal (No Tools)

Agent Level Benchmark

Claude Sonnet 4.5 2/2
BenchmarkClaude Sonnet 4.5Gemini 2.5-ProDiff
τ²-Bench - Telecom985 / 355432 / 35+44
Terminal Bench Hard338 / 132512 / 13+8

AI Agent - Tool Usage

Claude Sonnet 4.5 2/2
BenchmarkClaude Sonnet 4.5Gemini 2.5-ProDiff
Terminal-Bench503 / 3525.3028 / 35+24.70
Terminal Bench 2.042.8041 / 4632.6046 / 46+10.20

Coding and Software Engineer

Even 2/2
BenchmarkClaude Sonnet 4.5Gemini 2.5-ProDiff
SWE-bench Verified826 / 10867.2068 / 108+14.80
LiveCodeBench7147 / 12077.1034 / 120-6.10

AI Agent - Information Search

Claude Sonnet 4.5 1/1
BenchmarkClaude Sonnet 4.5Gemini 2.5-ProDiff
BrowseComp24.1043 / 457.8044 / 45+16.30

Instruction Following

Claude Sonnet 4.5 1/1
BenchmarkClaude Sonnet 4.5Gemini 2.5-ProDiff
IF Bench57.3021 / 294928 / 29+8.30

Long Context

Even 1/1
BenchmarkClaude Sonnet 4.5Gemini 2.5-ProDiff
AA-LCR668 / 13668 / 13

Multimodal Understanding

Gemini 2.5-Pro 1/1
BenchmarkClaude Sonnet 4.5Gemini 2.5-ProDiff
MMMU77.8014 / 28829 / 28-4.20

Productivity Knowledge

Claude Sonnet 4.5 1/1
BenchmarkClaude Sonnet 4.5Gemini 2.5-ProDiff
GDPval-AA3916 / 212221 / 21+17

Specs

FieldClaude Sonnet 4.5Gemini 2.5-Pro
PublisherAnthropicGoogle Deep Mind
Release date2025-09-302025-06-05
Model typeChat modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1000K1000K
Max output64K64K

Summary

  • Claude Sonnet 4.5leads in:General Knowledge (5/6), Agent Level Benchmark (2/2), AI Agent - Tool Usage (2/2), AI Agent - Information Search (1/1), Instruction Following (1/1), Productivity Knowledge (1/1)
  • Gemini 2.5-Proleads in:Math and Reasoning (4/6), Multimodal Understanding (1/1)
  • Tied in:Coding and Software Engineer, Long Context

On average across the 23 shared benchmarks, Claude Sonnet 4.5 scores 6.21 higher.

Largest single-benchmark gap: τ²-Bench - Telecom — Claude Sonnet 4.5 98 vs Gemini 2.5-Pro 54 (+44).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.