Claude Sonnet 4.5vsClaude Sonnet 4

Across 26 shared benchmarks, Claude Sonnet 4.5 leads overall: Claude Sonnet 4.5 wins 23, Claude Sonnet 4 wins 1, with 2 ties and an average score difference of +14.79.

Claude Sonnet 4.5

Anthropic · 2025-09-30 · Chat model

Claude Sonnet 4

Anthropic · 2025-05-23 · Reasoning model

Claude Sonnet 4.523 wins(88%)Ties2(4%)1 winClaude Sonnet 4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 26 shared benchmarks.

General Knowledge

Claude Sonnet 4.5 5/6

Benchmark	Claude Sonnet 4.5	Claude Sonnet 4	Diff
HLE	33.6080 / 172	9.60149 / 172	+24
ARC-AGI	63.7035 / 68	4049 / 68	+23.70
ARC-AGI-2	13.6038 / 62	5.9046 / 62	+7.70
MMLU Pro	887 / 132	8438 / 132	+4
LiveBench	53.6983 / 115Normal (No Tools)	50.9889 / 115Normal (No Tools)	+2.71
GPQA Diamond	83.4063 / 187	83.8062 / 187	-0.40

Math and Reasoning

Claude Sonnet 4.5 3/5

Benchmark	Claude Sonnet 4.5	Claude Sonnet 4	Diff
AIME2025	1001 / 107	8551 / 107	+15
FrontierMath - Tier 4	2.1056 / 80Normal (No Tools)	072 / 80Normal (No Tools)	+2.10
FrontierMath	5.2038 / 60	4.1041 / 60	+1.10
IMO-ProofBench	27.108 / 16	27.108 / 16	—
IMO-ProofBench Advanced	4.806 / 8	4.806 / 8	—

Coding and Software Engineer

Claude Sonnet 4.5 4/4

Benchmark	Claude Sonnet 4.5	Claude Sonnet 4	Diff
CodeClash	1,3891 / 8Normal (With Tools)	1,2234 / 8Normal (With Tools)	+166
LiveCodeBench	7148 / 123	6659 / 123	+5
SWE-bench Verified	828 / 112	80.2014 / 112	+1.80
SWE-Bench Pro - Public	43.6047 / 54	42.7048 / 54	+0.90

Agent Level Benchmark

Claude Sonnet 4.5 2/2

Benchmark	Claude Sonnet 4.5	Claude Sonnet 4	Diff
τ²-Bench - Telecom	985 / 35	6529 / 35	+33
τ²-Bench	84.709 / 43	5234 / 43	+32.70

AI Agent - Tool Usage

Claude Sonnet 4.5 2/2

Benchmark	Claude Sonnet 4.5	Claude Sonnet 4	Diff
OSWorld-Verified	61.4020 / 24	42.2022 / 24	+19.20
Terminal-Bench	503 / 35	41.3010 / 35	+8.70

Claw-style Agent Evaluation

Claude Sonnet 4.5 2/2

Benchmark	Claude Sonnet 4.5	Claude Sonnet 4	Diff
Claw Bench	88.1013 / 29Thinking (With Tools)	77.8023 / 29Thinking (With Tools)	+10.30
Pinch Bench	88.204 / 37Thinking (With Tools)	80.5022 / 37Thinking (With Tools)	+7.70

Commonsense Reasoning

Claude Sonnet 4.5 1/1

Benchmark	Claude Sonnet 4.5	Claude Sonnet 4	Diff
Simple Bench	54.3022 / 63Normal (No Tools)	45.5034 / 63Thinking (No Tools)	+8.80

Instruction Following

Claude Sonnet 4.5 1/1

Benchmark	Claude Sonnet 4.5	Claude Sonnet 4	Diff
IF Bench	57.3022 / 30	5523 / 30	+2.30

Long Context

Claude Sonnet 4.5 1/1

Benchmark	Claude Sonnet 4.5	Claude Sonnet 4	Diff
AA-LCR	6610 / 15	6512 / 15	+1

Multimodal Understanding

Claude Sonnet 4.5 1/1

Benchmark	Claude Sonnet 4.5	Claude Sonnet 4	Diff
MMMU	77.8015 / 29	76.5017 / 29	+1.30

Productivity Knowledge

Claude Sonnet 4.5 1/1

Benchmark	Claude Sonnet 4.5	Claude Sonnet 4	Diff
GDPval-AA	3916 / 21	3319 / 21	+6

Specs

Field	Claude Sonnet 4.5	Claude Sonnet 4
Publisher	Anthropic	Anthropic
Release date	2025-09-30	2025-05-23
Model type	Chat model	Reasoning model
Architecture	Dense	Dense
Parameters	Not available	Not available
Context length	1000K	200K
Max output	64K	64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

Item	Claude Sonnet 4.5	Claude Sonnet 4
Text input	$3 / 1M tokens	$3 / 1M tokens
Text output	$15 / 1M tokens	$15 / 1M tokens
Cache read	$0.3 / 1M tokens	$0.3 / 1M tokens
Cache write	$3.75 / 1M tokens	$3.75 / 1M tokens

Summary

Claude Sonnet 4.5leads in:General Knowledge (5/6), Math and Reasoning (3/5), Coding and Software Engineer (4/4), Agent Level Benchmark (2/2), AI Agent - Tool Usage (2/2), Claw-style Agent Evaluation (2/2), Commonsense Reasoning (1/1), Instruction Following (1/1), Long Context (1/1), Multimodal Understanding (1/1), Productivity Knowledge (1/1)

On average across the 26 shared benchmarks, Claude Sonnet 4.5 scores 14.79 higher.

Largest single-benchmark gap: CodeClash — Claude Sonnet 4.5 1,389 vs Claude Sonnet 4 1,223 (+166).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

Claude Sonnet 4.5 details Claude Sonnet 4 details·Customize in compare tool