Claude Opus 4.6vsClaude Opus 4

Across 11 shared benchmarks, Claude Opus 4.6 leads overall: Claude Opus 4.6 wins 10, Claude Opus 4 wins 1, with 0 ties and an average score difference of +26.70.

Anthropic
Claude Opus 4.6

Anthropic · 2026-02-05 · Reasoning model

Anthropic
Claude Opus 4

Anthropic · 2025-05-23 · Reasoning model

Claude Opus 4.610 wins(91%)(9%)1 winClaude Opus 4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 11 shared benchmarks.

General Knowledge

Claude Opus 4.6 4/4
BenchmarkClaude Opus 4.6Claude Opus 4Diff
ARC-AGI-266.3015 / 59Extended (no tools)8.6039 / 59+57.70
ARC-AGI9211 / 65Extended (no tools)35.7048 / 65+56.30
HLE5311 / 157Extended (with tools, internet)10.70129 / 157+42.30
GPQA Diamond91.3114 / 178Extended (no tools)79.6079 / 178+11.71

Math and Reasoning

Claude Opus 4.6 3/4
BenchmarkClaude Opus 4.6Claude Opus 4Diff
FrontierMath40.707 / 60最高(无工具)4.5039 / 60+36.20
AIME202599.797 / 106Extended (no tools)75.5065 / 106+24.29
FrontierMath - Tier 422.9012 / 80最高(无工具)4.2040 / 80+18.70
MATH-50097.6010 / 44Extended (no tools)98.203 / 44-0.60

Coding and Software Engineer

Claude Opus 4.6 2/2
BenchmarkClaude Opus 4.6Claude Opus 4Diff
LiveCodeBench7637 / 120Extended (no tools)56.6076 / 120+19.40
SWE-bench Verified80.849 / 108Extended (with tools)72.5048 / 108+8.34

Agent Level Benchmark

Claude Opus 4.6 1/1
BenchmarkClaude Opus 4.6Claude Opus 4Diff
τ²-Bench91.891 / 40Extended (with tools)72.5022 / 40+19.39

Specs

FieldClaude Opus 4.6Claude Opus 4
PublisherAnthropicAnthropic
Release date2026-02-052025-05-23
Model typeReasoning modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1000K200K
Max output64K32K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemClaude Opus 4.6Claude Opus 4
Text input$0.5 / 1M tokensNot public
Text output$25 / 1M tokensNot public
Cache read$0.5 / 1M tokensNot public
Cache write$10 / 1M tokensNot public

One or both models have incomplete public pricing.

Summary

  • Claude Opus 4.6leads in:General Knowledge (4/4), Math and Reasoning (3/4), Coding and Software Engineer (2/2), Agent Level Benchmark (1/1)

On average across the 11 shared benchmarks, Claude Opus 4.6 scores 26.70 higher.

Largest single-benchmark gap: ARC-AGI-2 — Claude Opus 4.6 66.30 vs Claude Opus 4 8.60 (+57.70).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.