Claude Opus 4.6vsOpus 4.1

在 6 个共同 benchmark 中，Claude Opus 4.6 整体领先：Claude Opus 4.6 领先 6 项，Opus 4.1 领先 0 项，持平 0 项，平均分差 +21.82。

Anthropic · 2026-02-05 · 推理大模型

Anthropic · 2025-08-06 · 推理大模型

Claude Opus 4.66 项(100%)(0%)0 项Opus 4.1

评测分数

按能力类目分组，每组内按分差大小排列；共 6 项。

Claude Opus 4.6 领先 3/3

评测项	Claude Opus 4.6	Opus 4.1	分差
FrontierMath	40.707 / 60最高（无工具）	5.9035 / 60Normal (No Tools)	+34.80
AIME2025	99.797 / 106Extended (no tools)	7860 / 106Extended (no tools)	+21.79
FrontierMath - Tier 4	22.9012 / 80最高（无工具）	4.2040 / 80Thinking (No Tools, 32K Budget)	+18.70

Claude Opus 4.6 领先 1/1

评测项	Claude Opus 4.6	Opus 4.1	分差
SWE-bench Verified	80.849 / 108Extended (with tools)	74.5036 / 108Extended (with tools)	+6.34

Claude Opus 4.6 领先 1/1

评测项	Claude Opus 4.6	Opus 4.1	分差
GPQA Diamond	91.3114 / 178Extended (no tools)	8169 / 178Extended (no tools)	+10.31

Claude Opus 4.6 领先 1/1

评测项	Claude Opus 4.6	Opus 4.1	分差
IF Bench	941 / 29Extended (no tools)	5522 / 29Extended (with tools)	+39

价格优先使用 DataLearner 配置的 API 记录；缺失项不做推测。

Claude Opus 4.6在以下类目领先:Math and Reasoning (3/3)、Coding and Software Engineer (1/1)、General Knowledge (1/1)、Instruction Following (1/1)

6 个共同 benchmark 上，Claude Opus 4.6 平均高出 21.82 分。

单项差距最大的 benchmark：IF Bench — Claude Opus 4.6 94，Opus 4.1 55（分差 +39）。

本页正文由结构化模型、价格与 benchmark 数据生成，不使用实时 LLM 撰写。