Claude Sonnet 4.5vsClaude Sonnet 4
Across 25 shared benchmarks, Claude Sonnet 4.5 leads overall: Claude Sonnet 4.5 wins 22, Claude Sonnet 4 wins 1, with 2 ties and an average score difference of +8.81.
Claude Sonnet 4.5
Anthropic · 2025-09-30 · Chat model
Claude Sonnet 4
Anthropic · 2025-05-23 · Reasoning model
Claude Sonnet 4.522 wins(88%)Ties2(4%)1 winClaude Sonnet 4
Benchmark scores
Grouped by capability, sorted by largest gap within each. 25 shared benchmarks.
General Knowledge
Claude Sonnet 4.5 5/6| Benchmark | Claude Sonnet 4.5 | Claude Sonnet 4 | Diff |
|---|---|---|---|
| HLE | 33.6067 / 157 | 9.60134 / 157 | +24 |
| ARC-AGI | 63.7032 / 65 | 4046 / 65 | +23.70 |
| ARC-AGI-2 | 13.6035 / 59 | 5.9043 / 59 | +7.70 |
| LiveBench | 78.264 / 52 | 73.8211 / 52 | +4.44 |
| MMLU Pro | 887 / 126 | 8437 / 126 | +4 |
| GPQA Diamond | 83.4058 / 178 | 83.8057 / 178 | -0.40 |
Math and Reasoning
Claude Sonnet 4.5 4/6| Benchmark | Claude Sonnet 4.5 | Claude Sonnet 4 | Diff |
|---|---|---|---|
| AIME2025 | 1001 / 106 | 8550 / 106 | +15 |
| Simple Bench | 54.309 / 27 | 45.5015 / 27 | +8.80 |
| FrontierMath - Tier 4 | 2.1056 / 80Normal (No Tools) | 072 / 80Normal (No Tools) | +2.10 |
| FrontierMath | 5.2038 / 60 | 4.1041 / 60 | +1.10 |
| IMO-ProofBench | 27.108 / 16 | 27.108 / 16 | — |
| IMO-ProofBench Advanced | 4.806 / 8 | 4.806 / 8 | — |
Coding and Software Engineer
Claude Sonnet 4.5 3/3| Benchmark | Claude Sonnet 4.5 | Claude Sonnet 4 | Diff |
|---|---|---|---|
| LiveCodeBench | 7147 / 120 | 6658 / 120 | +5 |
| SWE-bench Verified | 826 / 108 | 80.2013 / 108 | +1.80 |
| SWE-Bench Pro - Public | 43.6036 / 43 | 42.7037 / 43 | +0.90 |
Agent Level Benchmark
Claude Sonnet 4.5 2/2| Benchmark | Claude Sonnet 4.5 | Claude Sonnet 4 | Diff |
|---|---|---|---|
| τ²-Bench - Telecom | 985 / 35 | 6529 / 35 | +33 |
| τ²-Bench | 84.709 / 40 | 5233 / 40 | +32.70 |
AI Agent - Tool Usage
Claude Sonnet 4.5 2/2| Benchmark | Claude Sonnet 4.5 | Claude Sonnet 4 | Diff |
|---|---|---|---|
| OSWorld-Verified | 61.4014 / 18 | 42.2016 / 18 | +19.20 |
| Terminal-Bench | 503 / 35 | 41.3010 / 35 | +8.70 |
Claw-style Agent Evaluation
Claude Sonnet 4.5 2/2| Benchmark | Claude Sonnet 4.5 | Claude Sonnet 4 | Diff |
|---|---|---|---|
| Claw Bench | 88.1013 / 29Thinking (With Tools) | 77.8023 / 29Thinking (With Tools) | +10.30 |
| Pinch Bench | 88.204 / 37Thinking (With Tools) | 80.5022 / 37Thinking (With Tools) | +7.70 |
Instruction Following
Claude Sonnet 4.5 1/1| Benchmark | Claude Sonnet 4.5 | Claude Sonnet 4 | Diff |
|---|---|---|---|
| IF Bench | 57.3021 / 29 | 5522 / 29 | +2.30 |
Long Context
Claude Sonnet 4.5 1/1| Benchmark | Claude Sonnet 4.5 | Claude Sonnet 4 | Diff |
|---|---|---|---|
| AA-LCR | 668 / 13 | 6510 / 13 | +1 |
Multimodal Understanding
Claude Sonnet 4.5 1/1| Benchmark | Claude Sonnet 4.5 | Claude Sonnet 4 | Diff |
|---|---|---|---|
| MMMU | 77.8014 / 28 | 76.5016 / 28 | +1.30 |
Productivity Knowledge
Claude Sonnet 4.5 1/1| Benchmark | Claude Sonnet 4.5 | Claude Sonnet 4 | Diff |
|---|---|---|---|
| GDPval-AA | 3916 / 21 | 3319 / 21 | +6 |
Specs
| Field | Claude Sonnet 4.5 | Claude Sonnet 4 |
|---|---|---|
| Publisher | Anthropic | Anthropic |
| Release date | 2025-09-30 | 2025-05-23 |
| Model type | Chat model | Reasoning model |
| Architecture | Dense | Dense |
| Parameters | Not available | Not available |
| Context length | 1000K | 200K |
| Max output | 64K | 64K |
Summary
- Claude Sonnet 4.5leads in:General Knowledge (5/6), Math and Reasoning (4/6), Coding and Software Engineer (3/3), Agent Level Benchmark (2/2), AI Agent - Tool Usage (2/2), Claw-style Agent Evaluation (2/2), Instruction Following (1/1), Long Context (1/1), Multimodal Understanding (1/1), Productivity Knowledge (1/1)
On average across the 25 shared benchmarks, Claude Sonnet 4.5 scores 8.81 higher.
Largest single-benchmark gap: τ²-Bench - Telecom — Claude Sonnet 4.5 98 vs Claude Sonnet 4 65 (+33).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.