Claude Opus 4.6vsOpus 4.1

Across 6 shared benchmarks, Claude Opus 4.6 leads overall: Claude Opus 4.6 wins 6, Opus 4.1 wins 0, with 0 ties and an average score difference of +21.82.

Anthropic
Claude Opus 4.6

Anthropic · 2026-02-05 · Reasoning model

Anthropic
Opus 4.1

Anthropic · 2025-08-06 · Reasoning model

Claude Opus 4.66 wins(100%)(0%)0 winsOpus 4.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.

Math and Reasoning

Claude Opus 4.6 3/3
BenchmarkClaude Opus 4.6Opus 4.1Diff
FrontierMath40.707 / 60最高(无工具)5.9035 / 60Normal (No Tools)+34.80
AIME202599.797 / 106Extended (no tools)7860 / 106Extended (no tools)+21.79
FrontierMath - Tier 422.9012 / 80最高(无工具)4.2040 / 80Thinking (No Tools, 32K Budget)+18.70

Coding and Software Engineer

Claude Opus 4.6 1/1
BenchmarkClaude Opus 4.6Opus 4.1Diff
SWE-bench Verified80.849 / 108Extended (with tools)74.5036 / 108Extended (with tools)+6.34

General Knowledge

Claude Opus 4.6 1/1
BenchmarkClaude Opus 4.6Opus 4.1Diff
GPQA Diamond91.3114 / 178Extended (no tools)8169 / 178Extended (no tools)+10.31

Instruction Following

Claude Opus 4.6 1/1
BenchmarkClaude Opus 4.6Opus 4.1Diff
IF Bench941 / 29Extended (no tools)5522 / 29Extended (with tools)+39

Specs

FieldClaude Opus 4.6Opus 4.1
PublisherAnthropicAnthropic
Release date2026-02-052025-08-06
Model typeReasoning modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1000K200K
Max output64K32K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemClaude Opus 4.6Opus 4.1
Text input$0.5 / 1M tokens$15 / 1M tokens
Text output$25 / 1M tokens$75 / 1M tokens
Cache read$0.5 / 1M tokens$1.5 / 1M tokens
Cache write$10 / 1M tokens$18.75 / 1M tokens

Summary

  • Claude Opus 4.6leads in:Math and Reasoning (3/3), Coding and Software Engineer (1/1), General Knowledge (1/1), Instruction Following (1/1)

On average across the 6 shared benchmarks, Claude Opus 4.6 scores 21.82 higher.

Largest single-benchmark gap: IF Bench — Claude Opus 4.6 94 vs Opus 4.1 55 (+39).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.