Claude Sonnet 4.5vsClaude 3.5 Sonnet

Across 4 shared benchmarks, Claude Sonnet 4.5 leads overall: Claude Sonnet 4.5 wins 4, Claude 3.5 Sonnet wins 0, with 0 ties and an average score difference of +10.17.

Anthropic
Claude Sonnet 4.5

Anthropic · 2025-09-30 · Chat model

Anthropic
Claude 3.5 Sonnet

Anthropic · 2024-06-21 · Multimodal model

Claude Sonnet 4.54 wins(100%)(0%)0 winsClaude 3.5 Sonnet

Benchmark scores

Grouped by capability, sorted by largest gap within each. 4 shared benchmarks.

General Knowledge

Claude Sonnet 4.5 2/2
BenchmarkClaude Sonnet 4.5Claude 3.5 SonnetDiff
GPQA Diamond83.4058 / 17859.40141 / 178+24
MMLU Pro887 / 12677.6474 / 126+10.36

Math and Reasoning

Claude Sonnet 4.5 2/2
BenchmarkClaude Sonnet 4.5Claude 3.5 SonnetDiff
FrontierMath5.2038 / 60152 / 60+4.20
FrontierMath - Tier 42.1056 / 80Normal (No Tools)072 / 80Normal (No Tools)+2.10

Specs

FieldClaude Sonnet 4.5Claude 3.5 Sonnet
PublisherAnthropicAnthropic
Release date2025-09-302024-06-21
Model typeChat modelMultimodal model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1000K200K
Max output64KNot available

Summary

  • Claude Sonnet 4.5leads in:General Knowledge (2/2), Math and Reasoning (2/2)

On average across the 4 shared benchmarks, Claude Sonnet 4.5 scores 10.17 higher.

Largest single-benchmark gap: GPQA Diamond — Claude Sonnet 4.5 83.40 vs Claude 3.5 Sonnet 59.40 (+24).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.