- Claude Sonnet 4.6leads in:General Knowledge (3/3), Agent Level Benchmark (1/1), AI Agent - Tool Usage (1/1), Claw-style Agent Evaluation (1/1), Long Context (1/1), Math and Reasoning (1/1), Productivity Knowledge (1/1)
- Claude Sonnet 4leads in:Coding and Software Engineer (1/1)
On average across the 10 shared benchmarks, Claude Sonnet 4.6 scores 23.08 higher.
Largest single-benchmark gap: ARC-AGI-2 — Claude Sonnet 4.6 58.30 vs Claude Sonnet 4 1.30 (+57).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.