Claude Opus 4.6vsOpus 4.5

在 17 个共同 benchmark 中，Claude Opus 4.6 整体领先：Claude Opus 4.6 领先 14 项，Opus 4.5 领先 3 项，持平 0 项，平均分差 +8.99。

Anthropic · 2026-02-05 · 推理大模型

Anthropic · 2025-11-25 · 推理大模型

Claude Opus 4.614 项(82%)(18%)3 项Opus 4.5

评测分数

按能力类目分组，每组内按分差大小排列；共 17 项。

Claude Opus 4.6 领先 5/5

评测项	Claude Opus 4.6	Opus 4.5	分差
ARC-AGI-2	66.3017 / 62Extended (no tools)	37.6029 / 62Extended (no tools)	+28.70
ARC-AGI	9213 / 68Extended (no tools)	8024 / 68Extended (no tools)	+12
HLE	5318 / 172Extended (with tools, internet)	43.2049 / 172Extended (with tools)	+9.80
GPQA Diamond	91.3115 / 187Extended (no tools)	8742 / 187Extended (no tools)	+4.31
LiveBench	76.338 / 115Thinking High (No Tools)	75.9611 / 115Thinking (No Tools, 64K Budget)	+0.37

Claude Opus 4.6 领先 2/2

评测项	Claude Opus 4.6	Opus 4.5	分差
τ²-Bench	91.891 / 43Extended (with tools)	81.9913 / 43Extended (with tools)	+9.90
τ²-Bench - Telecom	99.252 / 35Extended (with tools)	90.7021 / 35Extended (with tools)	+8.55

Claude Opus 4.6 领先 2/2

评测项	Claude Opus 4.6	Opus 4.5	分差
MCP-Atlas	76.8010 / 27Deep Thinking (With Tools)	69.8016 / 27Thinking High (With Tools)	+7
Terminal Bench 2.0	65.4011 / 47Extended (with tools)	59.3020 / 47Extended (with tools)	+6.10

Opus 4.5 领先 2/2

评测项	Claude Opus 4.6	Opus 4.5	分差
LiveCodeBench	7638 / 123Extended (no tools)	8712 / 123Extended (with tools)	-11
SWE-bench Verified	80.8410 / 112Extended (with tools)	80.909 / 112Extended (with tools)	-0.06

Claude Opus 4.6 领先 2/2

评测项	Claude Opus 4.6	Opus 4.5	分差
FrontierMath	40.707 / 60最高（无工具）	20.7017 / 60Extended (no tools)	+20
FrontierMath - Tier 4	22.9012 / 80最高（无工具）	4.2040 / 80Normal (No Tools)	+18.70

Claude Opus 4.6 领先 1/1

评测项	Claude Opus 4.6	Opus 4.5	分差
Pinch Bench	87.407 / 37Thinking (With Tools)	87.208 / 37Extended (with tools)	+0.20

Claude Opus 4.6 领先 1/1

评测项	Claude Opus 4.6	Opus 4.5	分差
IF Bench	941 / 30Extended (no tools)	5821 / 30Extended (with tools)	+36

Opus 4.5 领先 1/1

评测项	Claude Opus 4.6	Opus 4.5	分差
MMMU	77.3016 / 29Extended (with tools)	80.7011 / 29Extended (no tools)	-3.40

Claude Opus 4.6 领先 1/1

评测项	Claude Opus 4.6	Opus 4.5	分差
Simple Bench	67.608 / 63Normal (No Tools)	6212 / 63Extended (no tools)	+5.60

价格优先使用 DataLearner 配置的 API 记录；缺失项不做推测。

Claude Opus 4.6在以下类目领先:General Knowledge (5/5)、Agent Level Benchmark (2/2)、AI Agent - Tool Usage (2/2)、Math and Reasoning (2/2)、Claw-style Agent Evaluation (1/1)、Instruction Following (1/1)、常识推理 (1/1)
Opus 4.5在以下类目领先:Coding and Software Engineer (2/2)、Multimodal Understanding (1/1)

17 个共同 benchmark 上，Claude Opus 4.6 平均高出 8.99 分。

单项差距最大的 benchmark：IF Bench — Claude Opus 4.6 94，Opus 4.5 58（分差 +36）。

本页正文由结构化模型、价格与 benchmark 数据生成，不使用实时 LLM 撰写。