Claude Sonnet 4.5vsClaude Sonnet 3.7

Across 13 shared benchmarks, Claude Sonnet 4.5 leads overall: Claude Sonnet 4.5 wins 13, Claude Sonnet 3.7 wins 0, with 0 ties and an average score difference of +17.89.

Anthropic
Claude Sonnet 4.5

Anthropic · 2025-09-30 · Chat model

Anthropic
Claude Sonnet 3.7

Anthropic · 2025-02-25 · Chat model

Claude Sonnet 4.513 wins(100%)(0%)0 winsClaude Sonnet 3.7

Benchmark scores

Grouped by capability, sorted by largest gap within each. 13 shared benchmarks.

Agent Level Benchmark

Claude Sonnet 4.5 3/3
BenchmarkClaude Sonnet 4.5Claude Sonnet 3.7Diff
τ²-Bench - Telecom985 / 355531 / 35+43
τ²-Bench84.709 / 4061.8029 / 40+22.90
Terminal Bench Hard338 / 132113 / 13+12

General Knowledge

Claude Sonnet 4.5 3/3
BenchmarkClaude Sonnet 4.5Claude Sonnet 3.7Diff
HLE33.6067 / 15710.30131 / 157+23.30
LiveBench78.264 / 5268.6424 / 52+9.62
GPQA Diamond83.4058 / 1787788 / 178+6.40

Math and Reasoning

Claude Sonnet 4.5 3/3
BenchmarkClaude Sonnet 4.5Claude Sonnet 3.7Diff
AIME20251001 / 10654.8084 / 106+45.20
Simple Bench54.309 / 2746.4014 / 27+7.90
FrontierMath5.2038 / 604.1041 / 60+1.10

AI Agent - Tool Usage

Claude Sonnet 4.5 1/1
BenchmarkClaude Sonnet 4.5Claude Sonnet 3.7Diff
OSWorld-Verified61.4014 / 182818 / 18+33.40

Coding and Software Engineer

Claude Sonnet 4.5 1/1
BenchmarkClaude Sonnet 4.5Claude Sonnet 3.7Diff
SWE-bench Verified826 / 10870.3055 / 108+11.70

Long Context

Claude Sonnet 4.5 1/1
BenchmarkClaude Sonnet 4.5Claude Sonnet 3.7Diff
AA-LCR668 / 136113 / 13+5

Productivity Knowledge

Claude Sonnet 4.5 1/1
BenchmarkClaude Sonnet 4.5Claude Sonnet 3.7Diff
GDPval-AA3916 / 212820 / 21+11

Specs

FieldClaude Sonnet 4.5Claude Sonnet 3.7
PublisherAnthropicAnthropic
Release date2025-09-302025-02-25
Model typeChat modelChat model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1000K128K
Max output64KNot available

Summary

  • Claude Sonnet 4.5leads in:Agent Level Benchmark (3/3), General Knowledge (3/3), Math and Reasoning (3/3), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1), Long Context (1/1), Productivity Knowledge (1/1)

On average across the 13 shared benchmarks, Claude Sonnet 4.5 scores 17.89 higher.

Largest single-benchmark gap: AIME2025 — Claude Sonnet 4.5 100 vs Claude Sonnet 3.7 54.80 (+45.20).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.