Claude Opus 4.6vsClaude Opus 4

在 11 个共同 benchmark 中，Claude Opus 4.6 整体领先：Claude Opus 4.6 领先 10 项，Claude Opus 4 领先 1 项，持平 0 项，平均分差 +26.70。

Anthropic · 2026-02-05 · 推理大模型

Anthropic · 2025-05-23 · 推理大模型

Claude Opus 4.610 项(91%)(9%)1 项Claude Opus 4

评测分数

按能力类目分组，每组内按分差大小排列；共 11 项。

Claude Opus 4.6 领先 4/4

评测项	Claude Opus 4.6	Claude Opus 4	分差
ARC-AGI-2	66.3015 / 59Extended (no tools)	8.6039 / 59	+57.70
ARC-AGI	9211 / 65Extended (no tools)	35.7048 / 65	+56.30
HLE	5311 / 157Extended (with tools, internet)	10.70129 / 157	+42.30
GPQA Diamond	91.3114 / 178Extended (no tools)	79.6079 / 178	+11.71

Claude Opus 4.6 领先 3/4

评测项	Claude Opus 4.6	Claude Opus 4	分差
FrontierMath	40.707 / 60最高（无工具）	4.5039 / 60	+36.20
AIME2025	99.797 / 106Extended (no tools)	75.5065 / 106	+24.29
FrontierMath - Tier 4	22.9012 / 80最高（无工具）	4.2040 / 80	+18.70
MATH-500	97.6010 / 44Extended (no tools)	98.203 / 44	-0.60

Claude Opus 4.6 领先 2/2

评测项	Claude Opus 4.6	Claude Opus 4	分差
LiveCodeBench	7637 / 120Extended (no tools)	56.6076 / 120	+19.40
SWE-bench Verified	80.849 / 108Extended (with tools)	72.5048 / 108	+8.34

Claude Opus 4.6 领先 1/1

评测项	Claude Opus 4.6	Claude Opus 4	分差
τ²-Bench	91.891 / 40Extended (with tools)	72.5022 / 40	+19.39

价格优先使用 DataLearner 配置的 API 记录；缺失项不做推测。

部分模型公开价格不完整，缺失字段按"暂无公开价格"展示。

Claude Opus 4.6在以下类目领先:General Knowledge (4/4)、Math and Reasoning (3/4)、Coding and Software Engineer (2/2)、Agent Level Benchmark (1/1)

11 个共同 benchmark 上，Claude Opus 4.6 平均高出 26.70 分。

单项差距最大的 benchmark：ARC-AGI-2 — Claude Opus 4.6 66.30，Claude Opus 4 8.60（分差 +57.70）。

本页正文由结构化模型、价格与 benchmark 数据生成，不使用实时 LLM 撰写。