Claude Opus 4.6vsOpus 4.5

在 14 个共同 benchmark 中,Claude Opus 4.6 整体领先:Claude Opus 4.6 领先 11 项,Opus 4.5 领先 3 项,持平 0 项,平均分差 +9.99。

Anthropic
Claude Opus 4.6

Anthropic · 2026-02-05 · 推理大模型

Anthropic
Opus 4.5

Anthropic · 2025-11-25 · 推理大模型

Claude Opus 4.611 (79%)(21%)3 Opus 4.5

评测分数

按能力类目分组,每组内按分差大小排列;共 14 项。

General Knowledge

Claude Opus 4.6 领先 4/4
评测项Claude Opus 4.6Opus 4.5分差
ARC-AGI-266.3015 / 59Extended (no tools)37.6026 / 59Extended (no tools)+28.70
ARC-AGI9211 / 65Extended (no tools)8021 / 65Extended (no tools)+12
HLE5311 / 157Extended (with tools, internet)43.2039 / 157Extended (with tools)+9.80
GPQA Diamond91.3114 / 178Extended (no tools)8738 / 178Extended (no tools)+4.31

Agent Level Benchmark

Claude Opus 4.6 领先 2/2
评测项Claude Opus 4.6Opus 4.5分差
τ²-Bench91.891 / 40Extended (with tools)81.9913 / 40Extended (with tools)+9.90
τ²-Bench - Telecom99.252 / 35Extended (with tools)90.7021 / 35Extended (with tools)+8.55

Coding and Software Engineer

Opus 4.5 领先 2/2
评测项Claude Opus 4.6Opus 4.5分差
LiveCodeBench7637 / 120Extended (no tools)8712 / 120Extended (with tools)-11
SWE-bench Verified80.849 / 108Extended (with tools)80.908 / 108Extended (with tools)-0.06

Math and Reasoning

Claude Opus 4.6 领先 2/2
评测项Claude Opus 4.6Opus 4.5分差
FrontierMath40.707 / 60最高(无工具)20.7017 / 60Extended (no tools)+20
FrontierMath - Tier 422.9012 / 80最高(无工具)4.2040 / 80Normal (No Tools)+18.70

AI Agent - Tool Usage

Claude Opus 4.6 领先 1/1
评测项Claude Opus 4.6Opus 4.5分差
Terminal Bench 2.065.4011 / 46Extended (with tools)59.3020 / 46Extended (with tools)+6.10

Claw-style Agent Evaluation

Claude Opus 4.6 领先 1/1
评测项Claude Opus 4.6Opus 4.5分差
Pinch Bench87.407 / 37Thinking (With Tools)87.208 / 37Extended (with tools)+0.20

Instruction Following

Claude Opus 4.6 领先 1/1
评测项Claude Opus 4.6Opus 4.5分差
IF Bench941 / 29Extended (no tools)5820 / 29Extended (with tools)+36

Multimodal Understanding

Opus 4.5 领先 1/1
评测项Claude Opus 4.6Opus 4.5分差
MMMU77.3015 / 28Extended (with tools)80.7010 / 28Extended (no tools)-3.40

规格对比

字段Claude Opus 4.6Opus 4.5
发布机构AnthropicAnthropic
发布时间2026-02-052025-11-25
模型类型推理大模型推理大模型
架构稠密模型稠密模型
参数规模暂无数据暂无数据
上下文长度1000K200K
最大输出64K64K

API 调用价格

价格优先使用 DataLearner 配置的 API 记录;缺失项不做推测。

价格项Claude Opus 4.6Opus 4.5
文本输入$0.5 / 1M tokens$5 / 1M tokens
文本输出$25 / 1M tokens$25 / 1M tokens
缓存读取$0.5 / 1M tokens$0.5 / 1M tokens
缓存写入$10 / 1M tokens$6.25 / 1M tokens

小结

  • Claude Opus 4.6在以下类目领先:General Knowledge (4/4)、Agent Level Benchmark (2/2)、Math and Reasoning (2/2)、AI Agent - Tool Usage (1/1)、Claw-style Agent Evaluation (1/1)、Instruction Following (1/1)
  • Opus 4.5在以下类目领先:Coding and Software Engineer (2/2)、Multimodal Understanding (1/1)

14 个共同 benchmark 上,Claude Opus 4.6 平均高出 9.99 分。

单项差距最大的 benchmark:IF Bench — Claude Opus 4.6 94,Opus 4.5 58(分差 +36)。

本页正文由结构化模型、价格与 benchmark 数据生成,不使用实时 LLM 撰写。