Claude Sonnet 4.6vsClaude Sonnet 3.7

Across 7 shared benchmarks, Claude Sonnet 4.6 leads overall: Claude Sonnet 4.6 wins 7, Claude Sonnet 3.7 wins 0, with 0 ties and an average score difference of +26.76.

Claude Sonnet 4.6

Anthropic · 2026-02-17 · Chat model

Claude Sonnet 3.7

Anthropic · 2025-02-25 · Chat model

Claude Sonnet 4.67 wins(100%)(0%)0 winsClaude Sonnet 3.7

Benchmark scores

Grouped by capability, sorted by largest gap within each. 7 shared benchmarks.

General Knowledge

Claude Sonnet 4.6 2/2

Benchmark	Claude Sonnet 4.6	Claude Sonnet 3.7	Diff
HLE	4925 / 157	10.30131 / 157	+38.70
GPQA Diamond	89.9021 / 178	7788 / 178	+12.90

Agent Level Benchmark

Claude Sonnet 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Sonnet 3.7	Diff
τ²-Bench - Telecom	97.909 / 35	5531 / 35	+42.90

AI Agent - Tool Usage

Claude Sonnet 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Sonnet 3.7	Diff
OSWorld-Verified	72.5010 / 18	2818 / 18	+44.50

Coding and Software Engineer

Claude Sonnet 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Sonnet 3.7	Diff
SWE-bench Verified	79.6017 / 108	70.3055 / 108	+9.30

Long Context

Claude Sonnet 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Sonnet 3.7	Diff
AA-LCR	711 / 13	6113 / 13	+10

Productivity Knowledge

Claude Sonnet 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Sonnet 3.7	Diff
GDPval-AA	5711 / 21	2820 / 21	+29

Specs

Field	Claude Sonnet 4.6	Claude Sonnet 3.7
Publisher	Anthropic	Anthropic
Release date	2026-02-17	2025-02-25
Model type	Chat model	Chat model
Architecture	Dense	Dense
Parameters	Not available	Not available
Context length	1M	128K
Max output	8K	Not available

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

Item	Claude Sonnet 4.6	Claude Sonnet 3.7
Text input	$3 / 1M tokens	Not public
Text output	$15 / 1M tokens	Not public
Cache read	$0.3 / 1M tokens	Not public
Cache write	$3.75 / 1M tokens	Not public

One or both models have incomplete public pricing.

Summary

Claude Sonnet 4.6leads in:General Knowledge (2/2), Agent Level Benchmark (1/1), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1), Long Context (1/1), Productivity Knowledge (1/1)

On average across the 7 shared benchmarks, Claude Sonnet 4.6 scores 26.76 higher.

Largest single-benchmark gap: OSWorld-Verified — Claude Sonnet 4.6 72.50 vs Claude Sonnet 3.7 28 (+44.50).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

Claude Sonnet 4.6 details Claude Sonnet 3.7 details·Customize in compare tool