Claude Sonnet 4.6vsClaude Sonnet 3.7
Across 7 shared benchmarks, Claude Sonnet 4.6 leads overall: Claude Sonnet 4.6 wins 7, Claude Sonnet 3.7 wins 0, with 0 ties and an average score difference of +29.19.
Claude Sonnet 4.6
Anthropic · 2026-02-17 · AI model
Claude Sonnet 3.7
Anthropic · 2025-02-25 · AI model
Claude Sonnet 4.67 wins(100%)(0%)0 winsClaude Sonnet 3.7
Benchmark scores
Grouped by capability, sorted by largest gap within each. 7 shared benchmarks.
General Knowledge
Claude Sonnet 4.6 2/2| Benchmark | Claude Sonnet 4.6 | Claude Sonnet 3.7 | Diff |
|---|---|---|---|
| HLE | 4920 / 149thinking + 使用工具 | 10.30123 / 149thinking | +38.70 |
| GPQA Diamond | 89.9018 / 175thinking | 68119 / 175 | +21.90 |
Agent Level Benchmark
Claude Sonnet 4.6 1/1| Benchmark | Claude Sonnet 4.6 |
|---|
Specs
| Field | Claude Sonnet 4.6 | Claude Sonnet 3.7 |
|---|---|---|
| Publisher | Anthropic | Anthropic |
| Release date | 2026-02-17 | 2025-02-25 |
| Model type | AI model | AI model |
| Architecture | Dense | Dense |
| Parameters | 0.0 | Not available |
| Context length | 1M | 128K |
| Max output | 8192 | Not available |
API pricing
Prices use DataLearner records when available; missing fields are not inferred.
| Item | Claude Sonnet 4.6 | Claude Sonnet 3.7 |
|---|---|---|
| Text input | $3 / 1M tokens | Not public |
| Text output | $15 / 1M tokens | Not public |
| Cache read | $0.3 / 1M tokens | Not public |
| Cache write | $3.75 / 1M tokens | Not public |
One or both models have incomplete public pricing.
Summary
- Claude Sonnet 4.6leads in:General Knowledge (2/2), Agent Level Benchmark (1/1), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1), Long Context (1/1), Productivity Knowledge (1/1)
On average across the 7 shared benchmarks, Claude Sonnet 4.6 scores 29.19 higher.
Largest single-benchmark gap: OSWorld-Verified — Claude Sonnet 4.6 72.50 vs Claude Sonnet 3.7 28 (+44.50).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.