Claude Sonnet 4.5vsClaude Sonnet 3.7

Across 13 shared benchmarks, Claude Sonnet 4.5 leads overall: Claude Sonnet 4.5 wins 13, Claude Sonnet 3.7 wins 0, with 0 ties and an average score difference of +17.89.

Claude Sonnet 4.5

Anthropic · 2025-09-30 · Chat model

Claude Sonnet 3.7

Anthropic · 2025-02-25 · Chat model

Claude Sonnet 4.513 wins(100%)(0%)0 winsClaude Sonnet 3.7

Benchmark scores

Grouped by capability, sorted by largest gap within each. 13 shared benchmarks.

Agent Level Benchmark

Claude Sonnet 4.5 3/3

Benchmark	Claude Sonnet 4.5	Claude Sonnet 3.7	Diff
τ²-Bench - Telecom	985 / 35	5531 / 35	+43
τ²-Bench	84.709 / 40	61.8029 / 40	+22.90
Terminal Bench Hard	338 / 13	2113 / 13	+12

General Knowledge

Claude Sonnet 4.5 3/3

Benchmark	Claude Sonnet 4.5	Claude Sonnet 3.7	Diff
HLE	33.6067 / 157	10.30131 / 157	+23.30
LiveBench	78.264 / 52	68.6424 / 52	+9.62
GPQA Diamond	83.4058 / 178	7788 / 178	+6.40

Math and Reasoning

Claude Sonnet 4.5 3/3

Benchmark	Claude Sonnet 4.5	Claude Sonnet 3.7	Diff
AIME2025	1001 / 106	54.8084 / 106	+45.20
Simple Bench	54.309 / 27	46.4014 / 27	+7.90
FrontierMath	5.2038 / 60	4.1041 / 60	+1.10

AI Agent - Tool Usage

Claude Sonnet 4.5 1/1

Benchmark	Claude Sonnet 4.5	Claude Sonnet 3.7	Diff
OSWorld-Verified	61.4014 / 18	2818 / 18	+33.40

Coding and Software Engineer

Claude Sonnet 4.5 1/1

Benchmark	Claude Sonnet 4.5	Claude Sonnet 3.7	Diff
SWE-bench Verified	826 / 108	70.3055 / 108	+11.70

Long Context

Claude Sonnet 4.5 1/1

Benchmark	Claude Sonnet 4.5	Claude Sonnet 3.7	Diff
AA-LCR	668 / 13	6113 / 13	+5

Productivity Knowledge

Claude Sonnet 4.5 1/1

Benchmark	Claude Sonnet 4.5	Claude Sonnet 3.7	Diff
GDPval-AA	3916 / 21	2820 / 21	+11

Specs

Field	Claude Sonnet 4.5	Claude Sonnet 3.7
Publisher	Anthropic	Anthropic
Release date	2025-09-30	2025-02-25
Model type	Chat model	Chat model
Architecture	Dense	Dense
Parameters	Not available	Not available
Context length	1000K	128K
Max output	64K	Not available

Summary

Claude Sonnet 4.5leads in:Agent Level Benchmark (3/3), General Knowledge (3/3), Math and Reasoning (3/3), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1), Long Context (1/1), Productivity Knowledge (1/1)

On average across the 13 shared benchmarks, Claude Sonnet 4.5 scores 17.89 higher.

Largest single-benchmark gap: AIME2025 — Claude Sonnet 4.5 100 vs Claude Sonnet 3.7 54.80 (+45.20).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

Claude Sonnet 4.5 details Claude Sonnet 3.7 details·Customize in compare tool