Claude Sonnet 4.5vsClaude Sonnet 4

Across 25 shared benchmarks, Claude Sonnet 4.5 leads overall: Claude Sonnet 4.5 wins 22, Claude Sonnet 4 wins 1, with 2 ties and an average score difference of +8.81.

Anthropic
Claude Sonnet 4.5

Anthropic · 2025-09-30 · Chat model

Anthropic
Claude Sonnet 4

Anthropic · 2025-05-23 · Reasoning model

Claude Sonnet 4.522 wins(88%)Ties2(4%)1 winClaude Sonnet 4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 25 shared benchmarks.

General Knowledge

Claude Sonnet 4.5 5/6
BenchmarkClaude Sonnet 4.5Claude Sonnet 4Diff
HLE33.6067 / 1579.60134 / 157+24
ARC-AGI63.7032 / 654046 / 65+23.70
ARC-AGI-213.6035 / 595.9043 / 59+7.70
LiveBench78.264 / 5273.8211 / 52+4.44
MMLU Pro887 / 1268437 / 126+4
GPQA Diamond83.4058 / 17883.8057 / 178-0.40

Math and Reasoning

Claude Sonnet 4.5 4/6
BenchmarkClaude Sonnet 4.5Claude Sonnet 4Diff
AIME20251001 / 1068550 / 106+15
Simple Bench54.309 / 2745.5015 / 27+8.80
FrontierMath - Tier 42.1056 / 80Normal (No Tools)072 / 80Normal (No Tools)+2.10
FrontierMath5.2038 / 604.1041 / 60+1.10
IMO-ProofBench27.108 / 1627.108 / 16
IMO-ProofBench Advanced4.806 / 84.806 / 8

Coding and Software Engineer

Claude Sonnet 4.5 3/3
BenchmarkClaude Sonnet 4.5Claude Sonnet 4Diff
LiveCodeBench7147 / 1206658 / 120+5
SWE-bench Verified826 / 10880.2013 / 108+1.80
SWE-Bench Pro - Public43.6036 / 4342.7037 / 43+0.90

Agent Level Benchmark

Claude Sonnet 4.5 2/2
BenchmarkClaude Sonnet 4.5Claude Sonnet 4Diff
τ²-Bench - Telecom985 / 356529 / 35+33
τ²-Bench84.709 / 405233 / 40+32.70

AI Agent - Tool Usage

Claude Sonnet 4.5 2/2
BenchmarkClaude Sonnet 4.5Claude Sonnet 4Diff
OSWorld-Verified61.4014 / 1842.2016 / 18+19.20
Terminal-Bench503 / 3541.3010 / 35+8.70

Claw-style Agent Evaluation

Claude Sonnet 4.5 2/2
BenchmarkClaude Sonnet 4.5Claude Sonnet 4Diff
Claw Bench88.1013 / 29Thinking (With Tools)77.8023 / 29Thinking (With Tools)+10.30
Pinch Bench88.204 / 37Thinking (With Tools)80.5022 / 37Thinking (With Tools)+7.70

Instruction Following

Claude Sonnet 4.5 1/1
BenchmarkClaude Sonnet 4.5Claude Sonnet 4Diff
IF Bench57.3021 / 295522 / 29+2.30

Long Context

Claude Sonnet 4.5 1/1
BenchmarkClaude Sonnet 4.5Claude Sonnet 4Diff
AA-LCR668 / 136510 / 13+1

Multimodal Understanding

Claude Sonnet 4.5 1/1
BenchmarkClaude Sonnet 4.5Claude Sonnet 4Diff
MMMU77.8014 / 2876.5016 / 28+1.30

Productivity Knowledge

Claude Sonnet 4.5 1/1
BenchmarkClaude Sonnet 4.5Claude Sonnet 4Diff
GDPval-AA3916 / 213319 / 21+6

Specs

FieldClaude Sonnet 4.5Claude Sonnet 4
PublisherAnthropicAnthropic
Release date2025-09-302025-05-23
Model typeChat modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1000K200K
Max output64K64K

Summary

  • Claude Sonnet 4.5leads in:General Knowledge (5/6), Math and Reasoning (4/6), Coding and Software Engineer (3/3), Agent Level Benchmark (2/2), AI Agent - Tool Usage (2/2), Claw-style Agent Evaluation (2/2), Instruction Following (1/1), Long Context (1/1), Multimodal Understanding (1/1), Productivity Knowledge (1/1)

On average across the 25 shared benchmarks, Claude Sonnet 4.5 scores 8.81 higher.

Largest single-benchmark gap: τ²-Bench - Telecom — Claude Sonnet 4.5 98 vs Claude Sonnet 4 65 (+33).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.