Claude Sonnet 4.6vsClaude Opus 4.6

在 11 个共同 benchmark 中,Claude Opus 4.6 整体领先:Claude Sonnet 4.6 领先 1 项,Claude Opus 4.6 领先 10 项,持平 0 项,平均分差 -144.98。

Anthropic
Claude Sonnet 4.6

Anthropic · 2026-02-17 · 聊天大模型

Anthropic
Claude Opus 4.6

Anthropic · 2026-02-05 · 推理大模型

Claude Sonnet 4.61 (9%)(91%)10 Claude Opus 4.6

评测分数

按能力类目分组,每组内按分差大小排列;共 11 项。

General Knowledge

Claude Opus 4.6 领先 3/3
评测项Claude Sonnet 4.6Claude Opus 4.6分差
ARC-AGI-258.3018 / 5966.3015 / 59Extended (no tools)-8
HLE4925 / 1575311 / 157Extended (with tools, internet)-4
GPQA Diamond89.9021 / 17891.3114 / 178Extended (no tools)-1.41

AI Agent - Tool Usage

Claude Opus 4.6 领先 2/2
评测项Claude Sonnet 4.6Claude Opus 4.6分差
Terminal Bench 2.059.1022 / 4665.4011 / 46Extended (with tools)-6.30
OSWorld-Verified72.5010 / 1872.709 / 18Extended (with tools)-0.20

Agent Level Benchmark

Claude Opus 4.6 领先 1/1
评测项Claude Sonnet 4.6Claude Opus 4.6分差
τ²-Bench - Telecom97.909 / 3599.252 / 35Extended (with tools)-1.35

AI Agent - Information Search

Claude Opus 4.6 领先 1/1
评测项Claude Sonnet 4.6Claude Opus 4.6分差
BrowseComp74.7020 / 45847 / 45Thinking (With Tools + Internet)-9.30

Claw-style Agent Evaluation

Claude Sonnet 4.6 领先 1/1
评测项Claude Sonnet 4.6Claude Opus 4.6分差
Pinch Bench885 / 37Thinking (With Tools)87.407 / 37Thinking (With Tools)+0.60

Coding and Software Engineer

Claude Opus 4.6 领先 1/1
评测项Claude Sonnet 4.6Claude Opus 4.6分差
SWE-bench Verified79.6017 / 10880.849 / 108Extended (with tools)-1.24

Math and Reasoning

Claude Opus 4.6 领先 1/1
评测项Claude Sonnet 4.6Claude Opus 4.6分差
FrontierMath - Tier 48.3034 / 80Thinking (No Tools, 16K Budget)22.9012 / 80最高(无工具)-14.60

Productivity Knowledge

Claude Opus 4.6 领先 1/1
评测项Claude Sonnet 4.6Claude Opus 4.6分差
GDPval-AA5711 / 211,6063 / 21Extended (with tools, internet)-1,549

规格对比

字段Claude Sonnet 4.6Claude Opus 4.6
发布机构AnthropicAnthropic
发布时间2026-02-172026-02-05
模型类型聊天大模型推理大模型
架构稠密模型稠密模型
参数规模暂无数据暂无数据
上下文长度1M1000K
最大输出8K64K

API 调用价格

价格优先使用 DataLearner 配置的 API 记录;缺失项不做推测。

价格项Claude Sonnet 4.6Claude Opus 4.6
文本输入$3 / 1M tokens$0.5 / 1M tokens
文本输出$15 / 1M tokens$25 / 1M tokens
缓存读取$0.3 / 1M tokens$0.5 / 1M tokens
缓存写入$3.75 / 1M tokens$10 / 1M tokens

小结

  • Claude Sonnet 4.6在以下类目领先:Claw-style Agent Evaluation (1/1)
  • Claude Opus 4.6在以下类目领先:General Knowledge (3/3)、AI Agent - Tool Usage (2/2)、Agent Level Benchmark (1/1)、AI Agent - Information Search (1/1)、Coding and Software Engineer (1/1)、Math and Reasoning (1/1)、Productivity Knowledge (1/1)

11 个共同 benchmark 上,Claude Opus 4.6 平均高出 144.98 分。

单项差距最大的 benchmark:GDPval-AA — Claude Sonnet 4.6 57,Claude Opus 4.6 1,606(分差 -1,549)。

本页正文由结构化模型、价格与 benchmark 数据生成,不使用实时 LLM 撰写。