Claude Sonnet 4.6vsClaude Opus 4.6

Across 11 shared benchmarks, Claude Opus 4.6 leads overall: Claude Sonnet 4.6 wins 1, Claude Opus 4.6 wins 10, with 0 ties and an average score difference of -144.98.

Anthropic
Claude Sonnet 4.6

Anthropic · 2026-02-17 · Chat model

Anthropic
Claude Opus 4.6

Anthropic · 2026-02-05 · Reasoning model

Claude Sonnet 4.61 win(9%)(91%)10 winsClaude Opus 4.6

Benchmark scores

Grouped by capability, sorted by largest gap within each. 11 shared benchmarks.

General Knowledge

Claude Opus 4.6 3/3
BenchmarkClaude Sonnet 4.6Claude Opus 4.6Diff
ARC-AGI-258.3018 / 5966.3015 / 59Extended (no tools)-8
HLE4925 / 1575311 / 157Extended (with tools, internet)-4
GPQA Diamond89.9021 / 17891.3114 / 178Extended (no tools)-1.41

AI Agent - Tool Usage

Claude Opus 4.6 2/2
BenchmarkClaude Sonnet 4.6Claude Opus 4.6Diff
Terminal Bench 2.059.1022 / 4665.4011 / 46Extended (with tools)-6.30
OSWorld-Verified72.5010 / 1872.709 / 18Extended (with tools)-0.20

Agent Level Benchmark

Claude Opus 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Opus 4.6Diff
τ²-Bench - Telecom97.909 / 3599.252 / 35Extended (with tools)-1.35

AI Agent - Information Search

Claude Opus 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Opus 4.6Diff
BrowseComp74.7020 / 45847 / 45Thinking (With Tools + Internet)-9.30

Claw-style Agent Evaluation

Claude Sonnet 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Opus 4.6Diff
Pinch Bench885 / 37Thinking (With Tools)87.407 / 37Thinking (With Tools)+0.60

Coding and Software Engineer

Claude Opus 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Opus 4.6Diff
SWE-bench Verified79.6017 / 10880.849 / 108Extended (with tools)-1.24

Math and Reasoning

Claude Opus 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Opus 4.6Diff
FrontierMath - Tier 48.3034 / 80Thinking (No Tools, 16K Budget)22.9012 / 80最高(无工具)-14.60

Productivity Knowledge

Claude Opus 4.6 1/1
BenchmarkClaude Sonnet 4.6Claude Opus 4.6Diff
GDPval-AA5711 / 211,6063 / 21Extended (with tools, internet)-1,549

Specs

FieldClaude Sonnet 4.6Claude Opus 4.6
PublisherAnthropicAnthropic
Release date2026-02-172026-02-05
Model typeChat modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1M1000K
Max output8K64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemClaude Sonnet 4.6Claude Opus 4.6
Text input$3 / 1M tokens$0.5 / 1M tokens
Text output$15 / 1M tokens$25 / 1M tokens
Cache read$0.3 / 1M tokens$0.5 / 1M tokens
Cache write$3.75 / 1M tokens$10 / 1M tokens

Summary

  • Claude Sonnet 4.6leads in:Claw-style Agent Evaluation (1/1)
  • Claude Opus 4.6leads in:General Knowledge (3/3), AI Agent - Tool Usage (2/2), Agent Level Benchmark (1/1), AI Agent - Information Search (1/1), Coding and Software Engineer (1/1), Math and Reasoning (1/1), Productivity Knowledge (1/1)

On average across the 11 shared benchmarks, Claude Opus 4.6 scores 144.98 higher.

Largest single-benchmark gap: GDPval-AA — Claude Sonnet 4.6 57 vs Claude Opus 4.6 1,606 (-1,549).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.