Claude Sonnet 4.6vsClaude Sonnet 4

Across 10 shared benchmarks, Claude Sonnet 4.6 leads overall: Claude Sonnet 4.6 wins 9, Claude Sonnet 4 wins 1, with 0 ties and an average score difference of +20.63.

Claude Sonnet 4.6

Anthropic · 2026-02-17 · Chat model

Claude Sonnet 4

Anthropic · 2025-05-23 · Reasoning model

Claude Sonnet 4.69 wins(90%)(10%)1 winClaude Sonnet 4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 10 shared benchmarks.

General Knowledge

Claude Sonnet 4.6 3/3

Benchmark	Claude Sonnet 4.6	Claude Sonnet 4	Diff
ARC-AGI-2	58.3018 / 59	5.9043 / 59	+52.40
HLE	4925 / 157	9.60134 / 157	+39.40
GPQA Diamond	89.9021 / 178	83.8057 / 178	+6.10

Agent Level Benchmark

Claude Sonnet 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Sonnet 4	Diff
τ²-Bench - Telecom	97.909 / 35	6529 / 35	+32.90

AI Agent - Tool Usage

Claude Sonnet 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Sonnet 4	Diff
OSWorld-Verified	72.5010 / 18	42.2016 / 18	+30.30

Claw-style Agent Evaluation

Claude Sonnet 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Sonnet 4	Diff
Pinch Bench	885 / 37Thinking (With Tools)	80.5022 / 37Thinking (With Tools)	+7.50

Coding and Software Engineer

Claude Sonnet 4 1/1

Benchmark	Claude Sonnet 4.6	Claude Sonnet 4	Diff
SWE-bench Verified	79.6017 / 108	80.2013 / 108	-0.60

Long Context

Claude Sonnet 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Sonnet 4	Diff
AA-LCR	711 / 13	6510 / 13	+6

Math and Reasoning

Claude Sonnet 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Sonnet 4	Diff
FrontierMath - Tier 4	8.3034 / 80Thinking (No Tools, 16K Budget)	072 / 80Normal (No Tools)	+8.30

Productivity Knowledge

Claude Sonnet 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Sonnet 4	Diff
GDPval-AA	5711 / 21	3319 / 21	+24

Specs

Field	Claude Sonnet 4.6	Claude Sonnet 4
Publisher	Anthropic	Anthropic
Release date	2026-02-17	2025-05-23
Model type	Chat model	Reasoning model
Architecture	Dense	Dense
Parameters	Not available	Not available
Context length	1M	200K
Max output	8K	64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

Item	Claude Sonnet 4.6	Claude Sonnet 4
Text input	$3 / 1M tokens	Not public
Text output	$15 / 1M tokens	Not public
Cache read	$0.3 / 1M tokens	Not public
Cache write	$3.75 / 1M tokens	Not public

One or both models have incomplete public pricing.

Summary

Claude Sonnet 4.6leads in:General Knowledge (3/3), Agent Level Benchmark (1/1), AI Agent - Tool Usage (1/1), Claw-style Agent Evaluation (1/1), Long Context (1/1), Math and Reasoning (1/1), Productivity Knowledge (1/1)
Claude Sonnet 4leads in:Coding and Software Engineer (1/1)

On average across the 10 shared benchmarks, Claude Sonnet 4.6 scores 20.63 higher.

Largest single-benchmark gap: ARC-AGI-2 — Claude Sonnet 4.6 58.30 vs Claude Sonnet 4 5.90 (+52.40).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

Claude Sonnet 4.6 details Claude Sonnet 4 details·Customize in compare tool