Claude Opus 4.6vsOpus 4.5

Across 14 shared benchmarks, Claude Opus 4.6 leads overall: Claude Opus 4.6 wins 11, Opus 4.5 wins 3, with 0 ties and an average score difference of +9.99.

Anthropic
Claude Opus 4.6

Anthropic · 2026-02-05 · Reasoning model

Anthropic
Opus 4.5

Anthropic · 2025-11-25 · Reasoning model

Claude Opus 4.611 wins(79%)(21%)3 winsOpus 4.5

Benchmark scores

Grouped by capability, sorted by largest gap within each. 14 shared benchmarks.

General Knowledge

Claude Opus 4.6 4/4
BenchmarkClaude Opus 4.6Opus 4.5Diff
ARC-AGI-266.3015 / 59Extended (no tools)37.6026 / 59Extended (no tools)+28.70
ARC-AGI9211 / 65Extended (no tools)8021 / 65Extended (no tools)+12
HLE5311 / 157Extended (with tools, internet)43.2039 / 157Extended (with tools)+9.80
GPQA Diamond91.3114 / 178Extended (no tools)8738 / 178Extended (no tools)+4.31

Agent Level Benchmark

Claude Opus 4.6 2/2
BenchmarkClaude Opus 4.6Opus 4.5Diff
τ²-Bench91.891 / 40Extended (with tools)81.9913 / 40Extended (with tools)+9.90
τ²-Bench - Telecom99.252 / 35Extended (with tools)90.7021 / 35Extended (with tools)+8.55

Coding and Software Engineer

Opus 4.5 2/2
BenchmarkClaude Opus 4.6Opus 4.5Diff
LiveCodeBench7637 / 120Extended (no tools)8712 / 120Extended (with tools)-11
SWE-bench Verified80.849 / 108Extended (with tools)80.908 / 108Extended (with tools)-0.06

Math and Reasoning

Claude Opus 4.6 2/2
BenchmarkClaude Opus 4.6Opus 4.5Diff
FrontierMath40.707 / 60最高(无工具)20.7017 / 60Extended (no tools)+20
FrontierMath - Tier 422.9012 / 80最高(无工具)4.2040 / 80Normal (No Tools)+18.70

AI Agent - Tool Usage

Claude Opus 4.6 1/1
BenchmarkClaude Opus 4.6Opus 4.5Diff
Terminal Bench 2.065.4011 / 46Extended (with tools)59.3020 / 46Extended (with tools)+6.10

Claw-style Agent Evaluation

Claude Opus 4.6 1/1
BenchmarkClaude Opus 4.6Opus 4.5Diff
Pinch Bench87.407 / 37Thinking (With Tools)87.208 / 37Extended (with tools)+0.20

Instruction Following

Claude Opus 4.6 1/1
BenchmarkClaude Opus 4.6Opus 4.5Diff
IF Bench941 / 29Extended (no tools)5820 / 29Extended (with tools)+36

Multimodal Understanding

Opus 4.5 1/1
BenchmarkClaude Opus 4.6Opus 4.5Diff
MMMU77.3015 / 28Extended (with tools)80.7010 / 28Extended (no tools)-3.40

Specs

FieldClaude Opus 4.6Opus 4.5
PublisherAnthropicAnthropic
Release date2026-02-052025-11-25
Model typeReasoning modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1000K200K
Max output64K64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemClaude Opus 4.6Opus 4.5
Text input$0.5 / 1M tokens$5 / 1M tokens
Text output$25 / 1M tokens$25 / 1M tokens
Cache read$0.5 / 1M tokens$0.5 / 1M tokens
Cache write$10 / 1M tokens$6.25 / 1M tokens

Summary

  • Claude Opus 4.6leads in:General Knowledge (4/4), Agent Level Benchmark (2/2), Math and Reasoning (2/2), AI Agent - Tool Usage (1/1), Claw-style Agent Evaluation (1/1), Instruction Following (1/1)
  • Opus 4.5leads in:Coding and Software Engineer (2/2), Multimodal Understanding (1/1)

On average across the 14 shared benchmarks, Claude Opus 4.6 scores 9.99 higher.

Largest single-benchmark gap: IF Bench — Claude Opus 4.6 94 vs Opus 4.5 58 (+36).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.