- Claude Sonnet 4.5leads in:General Knowledge (6/6), Math and Reasoning (3/6), Coding and Software Engineer (3/3), Agent Level Benchmark (2/2), AI Agent - Tool Usage (2/2), Claw-style Agent Evaluation (2/2), Instruction Following (1/1), Long Context (1/1), Multimodal Understanding (1/1), Productivity Knowledge (1/1)
On average across the 25 shared benchmarks, Claude Sonnet 4.5 scores 5.83 higher.
Largest single-benchmark gap: τ²-Bench - Telecom — Claude Sonnet 4.5 98 vs Claude Sonnet 4 65 (+33).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.