Claude Opus 4.6vsOpus 4.1

Across 6 shared benchmarks, Claude Opus 4.6 leads overall: Claude Opus 4.6 wins 6, Opus 4.1 wins 0, with 0 ties and an average score difference of +21.82.

Claude Opus 4.6

Anthropic · 2026-02-05 · Reasoning model

Opus 4.1

Anthropic · 2025-08-06 · Reasoning model

Claude Opus 4.66 wins(100%)(0%)0 winsOpus 4.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.

Math and Reasoning

Claude Opus 4.6 3/3

Benchmark	Claude Opus 4.6	Opus 4.1	Diff
FrontierMath	40.707 / 60最高（无工具）	5.9035 / 60Normal (No Tools)	+34.80
AIME2025	99.797 / 106Extended (no tools)	7860 / 106Extended (no tools)	+21.79
FrontierMath - Tier 4	22.9012 / 80最高（无工具）	4.2040 / 80Thinking (No Tools, 32K Budget)	+18.70

Coding and Software Engineer

Claude Opus 4.6 1/1

Benchmark	Claude Opus 4.6	Opus 4.1	Diff
SWE-bench Verified	80.849 / 108Extended (with tools)	74.5036 / 108Extended (with tools)	+6.34

General Knowledge

Claude Opus 4.6 1/1

Benchmark	Claude Opus 4.6	Opus 4.1	Diff
GPQA Diamond	91.3114 / 178Extended (no tools)	8169 / 178Extended (no tools)	+10.31

Instruction Following

Claude Opus 4.6 1/1

Benchmark	Claude Opus 4.6	Opus 4.1	Diff
IF Bench	941 / 29Extended (no tools)	5522 / 29Extended (with tools)	+39

Specs

Field	Claude Opus 4.6	Opus 4.1
Publisher	Anthropic	Anthropic
Release date	2026-02-05	2025-08-06
Model type	Reasoning model	Reasoning model
Architecture	Dense	Dense
Parameters	Not available	Not available
Context length	1000K	200K
Max output	64K	32K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

Item	Claude Opus 4.6	Opus 4.1
Text input	$0.5 / 1M tokens	$15 / 1M tokens
Text output	$25 / 1M tokens	$75 / 1M tokens
Cache read	$0.5 / 1M tokens	$1.5 / 1M tokens
Cache write	$10 / 1M tokens	$18.75 / 1M tokens

Summary

Claude Opus 4.6leads in:Math and Reasoning (3/3), Coding and Software Engineer (1/1), General Knowledge (1/1), Instruction Following (1/1)

On average across the 6 shared benchmarks, Claude Opus 4.6 scores 21.82 higher.

Largest single-benchmark gap: IF Bench — Claude Opus 4.6 94 vs Opus 4.1 55 (+39).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

Claude Opus 4.6 details Opus 4.1 details·Customize in compare tool