Claude Sonnet 4.5vsClaude 3.5 Sonnet New

Across 6 shared benchmarks, Claude Sonnet 4.5 leads overall: Claude Sonnet 4.5 wins 6, Claude 3.5 Sonnet New wins 0, with 0 ties and an average score difference of +16.48.

Anthropic
Claude Sonnet 4.5

Anthropic · 2025-09-30 · Chat model

Anthropic
Claude 3.5 Sonnet New

Anthropic · 2024-10-22 · Chat model

Claude Sonnet 4.56 wins(100%)(0%)0 winsClaude 3.5 Sonnet New

Benchmark scores

Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.

Coding and Software Engineer

Claude Sonnet 4.5 2/2
BenchmarkClaude Sonnet 4.5Claude 3.5 Sonnet NewDiff
SWE-bench Verified826 / 1084993 / 108+33
LiveCodeBench7147 / 12038.70102 / 120+32.30

General Knowledge

Claude Sonnet 4.5 2/2
BenchmarkClaude Sonnet 4.5Claude 3.5 Sonnet NewDiff
GPQA Diamond83.4058 / 17865131 / 178+18.40
MMLU Pro887 / 1267869 / 126+10

Math and Reasoning

Claude Sonnet 4.5 2/2
BenchmarkClaude Sonnet 4.5Claude 3.5 Sonnet NewDiff
FrontierMath5.2038 / 602.1047 / 60+3.10
FrontierMath - Tier 42.1056 / 80Normal (No Tools)072 / 80Normal (No Tools)+2.10

Specs

FieldClaude Sonnet 4.5Claude 3.5 Sonnet New
PublisherAnthropicAnthropic
Release date2025-09-302024-10-22
Model typeChat modelChat model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1000K200K
Max output64KNot available

Summary

  • Claude Sonnet 4.5leads in:Coding and Software Engineer (2/2), General Knowledge (2/2), Math and Reasoning (2/2)

On average across the 6 shared benchmarks, Claude Sonnet 4.5 scores 16.48 higher.

Largest single-benchmark gap: SWE-bench Verified — Claude Sonnet 4.5 82 vs Claude 3.5 Sonnet New 49 (+33).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.