Claude Sonnet 4.6vsClaude Sonnet 4

Across 10 shared benchmarks, Claude Sonnet 4.6 leads overall: Claude Sonnet 4.6 wins 9, Claude Sonnet 4 wins 1, with 0 ties and an average score difference of +20.63.

Anthropic
Claude Sonnet 4.6

Anthropic · 2026-02-17 · Chat model

Anthropic
Claude Sonnet 4

Anthropic · 2025-05-23 · Reasoning model

Claude Sonnet 4.69 wins(90%)(10%)1 winClaude Sonnet 4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 10 shared benchmarks.

General Knowledge

Claude Sonnet 4.6 3/3
BenchmarkClaude Sonnet 4.6Claude Sonnet 4Diff
ARC-AGI-258.3018 / 595.9043 / 59+52.40
HLE4925 / 1579.60134 / 157+39.40
GPQA Diamond89.9021 / 17883.8057 / 178+6.10

Agent Level Benchmark

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Sonnet 4Diff
τ²-Bench - Telecom97.909 / 356529 / 35+32.90

AI Agent - Tool Usage

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Sonnet 4Diff
OSWorld-Verified72.5010 / 1842.2016 / 18+30.30

Claw-style Agent Evaluation

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Sonnet 4Diff
Pinch Bench885 / 37Thinking (With Tools)80.5022 / 37Thinking (With Tools)+7.50

Coding and Software Engineer

Claude Sonnet 4 1/1
BenchmarkClaude Sonnet 4.6Claude Sonnet 4Diff
SWE-bench Verified79.6017 / 10880.2013 / 108-0.60

Long Context

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Sonnet 4Diff
AA-LCR711 / 136510 / 13+6

Math and Reasoning

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Sonnet 4Diff
FrontierMath - Tier 48.3034 / 80Thinking (No Tools, 16K Budget)072 / 80Normal (No Tools)+8.30

Productivity Knowledge

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Sonnet 4Diff
GDPval-AA5711 / 213319 / 21+24

Specs

FieldClaude Sonnet 4.6Claude Sonnet 4
PublisherAnthropicAnthropic
Release date2026-02-172025-05-23
Model typeChat modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1M200K
Max output8K64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemClaude Sonnet 4.6Claude Sonnet 4
Text input$3 / 1M tokensNot public
Text output$15 / 1M tokensNot public
Cache read$0.3 / 1M tokensNot public
Cache write$3.75 / 1M tokensNot public

One or both models have incomplete public pricing.

Summary

  • Claude Sonnet 4.6leads in:General Knowledge (3/3), Agent Level Benchmark (1/1), AI Agent - Tool Usage (1/1), Claw-style Agent Evaluation (1/1), Long Context (1/1), Math and Reasoning (1/1), Productivity Knowledge (1/1)
  • Claude Sonnet 4leads in:Coding and Software Engineer (1/1)

On average across the 10 shared benchmarks, Claude Sonnet 4.6 scores 20.63 higher.

Largest single-benchmark gap: ARC-AGI-2 — Claude Sonnet 4.6 58.30 vs Claude Sonnet 4 5.90 (+52.40).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.