Claude Sonnet 4.6vsClaude Sonnet 3.7

Across 7 shared benchmarks, Claude Sonnet 4.6 leads overall: Claude Sonnet 4.6 wins 7, Claude Sonnet 3.7 wins 0, with 0 ties and an average score difference of +26.76.

Anthropic
Claude Sonnet 4.6

Anthropic · 2026-02-17 · Chat model

Anthropic
Claude Sonnet 3.7

Anthropic · 2025-02-25 · Chat model

Claude Sonnet 4.67 wins(100%)(0%)0 winsClaude Sonnet 3.7

Benchmark scores

Grouped by capability, sorted by largest gap within each. 7 shared benchmarks.

General Knowledge

Claude Sonnet 4.6 2/2
BenchmarkClaude Sonnet 4.6Claude Sonnet 3.7Diff
HLE4925 / 15710.30131 / 157+38.70
GPQA Diamond89.9021 / 1787788 / 178+12.90

Agent Level Benchmark

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Sonnet 3.7Diff
τ²-Bench - Telecom97.909 / 355531 / 35+42.90

AI Agent - Tool Usage

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Sonnet 3.7Diff
OSWorld-Verified72.5010 / 182818 / 18+44.50

Coding and Software Engineer

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Sonnet 3.7Diff
SWE-bench Verified79.6017 / 10870.3055 / 108+9.30

Long Context

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Sonnet 3.7Diff
AA-LCR711 / 136113 / 13+10

Productivity Knowledge

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Sonnet 3.7Diff
GDPval-AA5711 / 212820 / 21+29

Specs

FieldClaude Sonnet 4.6Claude Sonnet 3.7
PublisherAnthropicAnthropic
Release date2026-02-172025-02-25
Model typeChat modelChat model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1M128K
Max output8KNot available

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemClaude Sonnet 4.6Claude Sonnet 3.7
Text input$3 / 1M tokensNot public
Text output$15 / 1M tokensNot public
Cache read$0.3 / 1M tokensNot public
Cache write$3.75 / 1M tokensNot public

One or both models have incomplete public pricing.

Summary

  • Claude Sonnet 4.6leads in:General Knowledge (2/2), Agent Level Benchmark (1/1), AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1), Long Context (1/1), Productivity Knowledge (1/1)

On average across the 7 shared benchmarks, Claude Sonnet 4.6 scores 26.76 higher.

Largest single-benchmark gap: OSWorld-Verified — Claude Sonnet 4.6 72.50 vs Claude Sonnet 3.7 28 (+44.50).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.