Opus 4.7vsClaude Opus 4.6

在 15 个共同 benchmark 中，Opus 4.7 整体领先：Opus 4.7 领先 11 项，Claude Opus 4.6 领先 2 项，持平 2 项，平均分差 +1.83。

Anthropic · 2026-04-16 · 推理大模型

Anthropic · 2026-02-05 · 推理大模型

Opus 4.711 项(73%)持平2(13%)2 项Claude Opus 4.6

评测分数

按能力类目分组，每组内按分差大小排列；共 15 项。

Opus 4.7 领先 6/7

评测项	Opus 4.7	Claude Opus 4.6	分差
ARC-AGI-2	75.8011 / 62最高（无工具）	66.3017 / 62Extended (no tools)	+9.50
GPQA Diamond	94.204 / 187Extended (no tools)	91.3115 / 187Extended (no tools)	+2.89
HLE	54.7013 / 172Extended (with tools)	5318 / 172Extended (with tools, internet)	+1.70
ARC-AGI	93.5011 / 68Thinking High (No Tools)	9213 / 68Extended (no tools)	+1.50
LiveBench	76.917 / 115Deep Thinking (No Tools)	76.338 / 115Thinking High (No Tools)	+0.58
MMLU	91.506 / 66Normal (No Tools)	91.057 / 66Extended (no tools)	+0.45
ARC-AGI-3	08 / 9Thinking High (No Tools)	04 / 9最高（无工具）	持平

Opus 4.7 领先 3/3

评测项	Opus 4.7	Claude Opus 4.6	分差
OSWorld-Verified	7810 / 24Extended (with tools)	72.7015 / 24Extended (with tools)	+5.30
Terminal Bench 2.0	69.406 / 47Extended (with tools)	65.4011 / 47Extended (with tools)	+4
MCP-Atlas	79.107 / 27Deep Thinking (With Tools)	76.8010 / 27Deep Thinking (With Tools)	+2.30

Opus 4.7 领先 1/2

评测项	Opus 4.7	Claude Opus 4.6	分差
FrontierMath	43.806 / 60极高强度思考（无工具）	40.707 / 60最高（无工具）	+3.10
FrontierMath - Tier 4	22.9012 / 80极高强度思考（无工具）	22.9012 / 80最高（无工具）	持平

Claude Opus 4.6 领先 1/1

评测项	Opus 4.7	Claude Opus 4.6	分差
BrowseComp	79.3017 / 53Extended (with tools)	8411 / 53Thinking (With Tools + Internet)	-4.70

Opus 4.7 领先 1/1

评测项	Opus 4.7	Claude Opus 4.6	分差
SWE-bench Verified	87.606 / 112Extended (with tools)	80.8410 / 112Extended (with tools)	+6.76

Claude Opus 4.6 领先 1/1

评测项	Opus 4.7	Claude Opus 4.6	分差
Simple Bench	61.7013 / 63Normal (No Tools)	67.608 / 63Normal (No Tools)	-5.90

价格优先使用 DataLearner 配置的 API 记录；缺失项不做推测。

Opus 4.7在以下类目领先:General Knowledge (6/7)、AI Agent - Tool Usage (3/3)、Math and Reasoning (1/2)、Coding and Software Engineer (1/1)
Claude Opus 4.6在以下类目领先:AI Agent - Information Search (1/1)、常识推理 (1/1)

15 个共同 benchmark 上，Opus 4.7 平均高出 1.83 分。

单项差距最大的 benchmark：ARC-AGI-2 — Opus 4.7 75.80，Claude Opus 4.6 66.30（分差 +9.50）。

本页正文由结构化模型、价格与 benchmark 数据生成，不使用实时 LLM 撰写。