Claude Sonnet 4.6vsClaude Opus 4.6

Across 13 shared benchmarks, Claude Opus 4.6 leads overall: Claude Sonnet 4.6 wins 1, Claude Opus 4.6 wins 12, with 0 ties and an average score difference of -123.30.

Claude Sonnet 4.6

Anthropic · 2026-02-17 · Chat model

Claude Opus 4.6

Anthropic · 2026-02-05 · Reasoning model

Claude Sonnet 4.61 win(8%)(92%)12 winsClaude Opus 4.6

Benchmark scores

Grouped by capability, sorted by largest gap within each. 13 shared benchmarks.

General Knowledge

Claude Opus 4.6 4/4

Benchmark	Claude Sonnet 4.6	Claude Opus 4.6	Diff
ARC-AGI-2	58.3021 / 62	66.3017 / 62Extended (no tools)	-8
HLE	4932 / 172	5318 / 172Extended (with tools, internet)	-4
GPQA Diamond	89.9024 / 187	91.3115 / 187Extended (no tools)	-1.41
LiveBench	75.4712 / 115Thinking Medium (No Tools)	76.338 / 115Thinking High (No Tools)	-0.86

AI Agent - Tool Usage

Claude Opus 4.6 3/3

Benchmark	Claude Sonnet 4.6	Claude Opus 4.6	Diff
MCP-Atlas	69.5017 / 27Normal (With Tools)	76.8010 / 27Deep Thinking (With Tools)	-7.30
Terminal Bench 2.0	59.1022 / 47	65.4011 / 47Extended (with tools)	-6.30
OSWorld-Verified	72.5016 / 24	72.7015 / 24Extended (with tools)	-0.20

Agent Level Benchmark

Claude Opus 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Opus 4.6	Diff
τ²-Bench - Telecom	97.909 / 35	99.252 / 35Extended (with tools)	-1.35

AI Agent - Information Search

Claude Opus 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Opus 4.6	Diff
BrowseComp	74.7027 / 53	8411 / 53Thinking (With Tools + Internet)	-9.30

Claw-style Agent Evaluation

Claude Sonnet 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Opus 4.6	Diff
Pinch Bench	885 / 37Thinking (With Tools)	87.407 / 37Thinking (With Tools)	+0.60

Coding and Software Engineer

Claude Opus 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Opus 4.6	Diff
SWE-bench Verified	79.6018 / 112	80.8410 / 112Extended (with tools)	-1.24

Math and Reasoning

Claude Opus 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Opus 4.6	Diff
FrontierMath - Tier 4	8.3034 / 80Thinking (No Tools, 16K Budget)	22.9012 / 80最高（无工具）	-14.60

Productivity Knowledge

Claude Opus 4.6 1/1

Benchmark	Claude Sonnet 4.6	Claude Opus 4.6	Diff
GDPval-AA	5711 / 21	1,6063 / 21Extended (with tools, internet)	-1,549

Specs

Field	Claude Sonnet 4.6	Claude Opus 4.6
Publisher	Anthropic	Anthropic
Release date	2026-02-17	2026-02-05
Model type	Chat model	Reasoning model
Architecture	Dense	Dense
Parameters	Not available	Not available
Context length	1M	1000K
Max output	8K	64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

Item	Claude Sonnet 4.6	Claude Opus 4.6
Text input	$3 / 1M tokens	$0.5 / 1M tokens
Text output	$15 / 1M tokens	$25 / 1M tokens
Cache read	$0.3 / 1M tokens	$0.5 / 1M tokens
Cache write	$3.75 / 1M tokens	$10 / 1M tokens

Summary

Claude Sonnet 4.6leads in:Claw-style Agent Evaluation (1/1)
Claude Opus 4.6leads in:General Knowledge (4/4), AI Agent - Tool Usage (3/3), Agent Level Benchmark (1/1), AI Agent - Information Search (1/1), Coding and Software Engineer (1/1), Math and Reasoning (1/1), Productivity Knowledge (1/1)

On average across the 13 shared benchmarks, Claude Opus 4.6 scores 123.30 higher.

Largest single-benchmark gap: GDPval-AA — Claude Sonnet 4.6 57 vs Claude Opus 4.6 1,606 (-1,549).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

Claude Sonnet 4.6 details Claude Opus 4.6 details·Customize in compare tool