- Claude Sonnet 4.6leads in:Claw-style Agent Evaluation (1/1)
- Claude Opus 4.6leads in:General Knowledge (3/3), AI Agent - Tool Usage (2/2), Agent Level Benchmark (1/1), AI Agent - Information Search (1/1), Coding and Software Engineer (1/1), Math and Reasoning (1/1), Productivity Knowledge (1/1)
On average across the 11 shared benchmarks, Claude Opus 4.6 scores 144.98 higher.
Largest single-benchmark gap: GDPval-AA — Claude Sonnet 4.6 57 vs Claude Opus 4.6 1,606 (-1,549).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.