Claude Sonnet 4.5vsGemini 2.5-Pro

Across 24 shared benchmarks, Claude Sonnet 4.5 leads overall: Claude Sonnet 4.5 wins 14, Gemini 2.5-Pro wins 8, with 2 ties and an average score difference of +16.50.

Claude Sonnet 4.5

Anthropic · 2025-09-30 · Chat model

Gemini 2.5-Pro

Google Deep Mind · 2025-06-05 · Reasoning model

Claude Sonnet 4.514 wins(58%)Ties2(33%)8 winsGemini 2.5-Pro

Benchmark scores

Grouped by capability, sorted by largest gap within each. 24 shared benchmarks.

General Knowledge

Claude Sonnet 4.5 4/6

Benchmark	Claude Sonnet 4.5	Gemini 2.5-Pro	Diff
ARC-AGI	63.7035 / 68	3750 / 68	+26.70
HLE	33.6080 / 172	21.60112 / 172	+12
ARC-AGI-2	13.6038 / 62	4.9047 / 62	+8.70
LiveBench	53.6983 / 115Normal (No Tools)	58.3376 / 115Thinking High (No Tools)	-4.64
GPQA Diamond	83.4063 / 187	86.4045 / 187	-3
MMLU Pro	887 / 132	8621 / 132	+2

Math and Reasoning

Gemini 2.5-Pro 3/5

Benchmark	Claude Sonnet 4.5	Gemini 2.5-Pro	Diff
IMO-ProofBench	27.108 / 16	55.203 / 16	-28.10
IMO-ProofBench Advanced	4.806 / 8	17.604 / 8	-12.80
AIME2025	1001 / 107	8844 / 107	+12
FrontierMath	5.2038 / 60	1123 / 60	-5.80
FrontierMath - Tier 4	2.1056 / 80Normal (No Tools)	2.1056 / 80Normal (No Tools)	—

Coding and Software Engineer

Claude Sonnet 4.5 2/3

Benchmark	Claude Sonnet 4.5	Gemini 2.5-Pro	Diff
CodeClash	1,3891 / 8Normal (With Tools)	1,1256 / 8Normal (With Tools)	+264
SWE-bench Verified	828 / 112	67.2072 / 112	+14.80
LiveCodeBench	7148 / 123	77.1034 / 123	-6.10

Agent Level Benchmark

Claude Sonnet 4.5 2/2

Benchmark	Claude Sonnet 4.5	Gemini 2.5-Pro	Diff
τ²-Bench - Telecom	985 / 35	5432 / 35	+44
Terminal Bench Hard	338 / 13	2512 / 13	+8

AI Agent - Tool Usage

Claude Sonnet 4.5 2/2

Benchmark	Claude Sonnet 4.5	Gemini 2.5-Pro	Diff
Terminal-Bench	503 / 35	25.3028 / 35	+24.70
Terminal Bench 2.0	42.8042 / 47	32.6047 / 47	+10.20

AI Agent - Information Search

Claude Sonnet 4.5 1/1

Benchmark	Claude Sonnet 4.5	Gemini 2.5-Pro	Diff
BrowseComp	24.1051 / 53	7.8052 / 53	+16.30

Commonsense Reasoning

Gemini 2.5-Pro 1/1

Benchmark	Claude Sonnet 4.5	Gemini 2.5-Pro	Diff
Simple Bench	54.3022 / 63Normal (No Tools)	62.4011 / 63Thinking (No Tools)	-8.10

Instruction Following

Claude Sonnet 4.5 1/1

Benchmark	Claude Sonnet 4.5	Gemini 2.5-Pro	Diff
IF Bench	57.3022 / 30	4929 / 30	+8.30

Long Context

Even 1/1

Benchmark	Claude Sonnet 4.5	Gemini 2.5-Pro	Diff
AA-LCR	6610 / 15	6610 / 15	—

Multimodal Understanding

Gemini 2.5-Pro 1/1

Benchmark	Claude Sonnet 4.5	Gemini 2.5-Pro	Diff
MMMU	77.8015 / 29	8210 / 29	-4.20

Productivity Knowledge

Claude Sonnet 4.5 1/1

Benchmark	Claude Sonnet 4.5	Gemini 2.5-Pro	Diff
GDPval-AA	3916 / 21	2221 / 21	+17

Specs

Field	Claude Sonnet 4.5	Gemini 2.5-Pro
Publisher	Anthropic	Google Deep Mind
Release date	2025-09-30	2025-06-05
Model type	Chat model	Reasoning model
Architecture	Dense	Dense
Parameters	Not available	Not available
Context length	1000K	1000K
Max output	64K	64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

Item	Claude Sonnet 4.5	Gemini 2.5-Pro
Text input	$3 / 1M tokens	$1.25 / 1M tokens
Text output	$15 / 1M tokens	$10 / 1M tokens
Cache read	$0.3 / 1M tokens	Not public
Cache write	$3.75 / 1M tokens	Not public

Summary

Claude Sonnet 4.5leads in:General Knowledge (4/6), Coding and Software Engineer (2/3), Agent Level Benchmark (2/2), AI Agent - Tool Usage (2/2), AI Agent - Information Search (1/1), Instruction Following (1/1), Productivity Knowledge (1/1)
Gemini 2.5-Proleads in:Math and Reasoning (3/5), Commonsense Reasoning (1/1), Multimodal Understanding (1/1)
Tied in:Long Context

On average across the 24 shared benchmarks, Claude Sonnet 4.5 scores 16.50 higher.

Largest single-benchmark gap: CodeClash — Claude Sonnet 4.5 1,389 vs Gemini 2.5-Pro 1,125 (+264).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

Claude Sonnet 4.5 details Gemini 2.5-Pro details·Customize in compare tool