Opus 4.7vsOpus 4.1

Across 4 shared benchmarks, Opus 4.7 leads overall: Opus 4.7 wins 4, Opus 4.1 wins 0, with 0 ties and an average score difference of +20.72.

Anthropic
Opus 4.7

Anthropic · 2026-04-16 · Reasoning model

Anthropic
Opus 4.1

Anthropic · 2025-08-06 · Reasoning model

Opus 4.74 wins(100%)(0%)0 winsOpus 4.1

Benchmark scores

Grouped by capability, sorted by largest gap within each. 4 shared benchmarks.

Math and Reasoning

Opus 4.7 2/2
BenchmarkOpus 4.7Opus 4.1Diff
FrontierMath43.806 / 60极高强度思考(无工具)5.9035 / 60Normal (No Tools)+37.90
FrontierMath - Tier 422.9012 / 80极高强度思考(无工具)4.2040 / 80Thinking (No Tools, 32K Budget)+18.70

Coding and Software Engineer

Opus 4.7 1/1
BenchmarkOpus 4.7Opus 4.1Diff
SWE-bench Verified87.605 / 108Extended (with tools)74.5036 / 108Extended (with tools)+13.10

General Knowledge

Opus 4.7 1/1
BenchmarkOpus 4.7Opus 4.1Diff
GPQA Diamond94.204 / 178Extended (no tools)8169 / 178Extended (no tools)+13.20

Specs

FieldOpus 4.7Opus 4.1
PublisherAnthropicAnthropic
Release date2026-04-162025-08-06
Model typeReasoning modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1000K200K
Max output128K32K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemOpus 4.7Opus 4.1
Text input$5 / 1M tokens$15 / 1M tokens
Text output$25 / 1M tokens$75 / 1M tokens
Cache read$0.5 / 1M tokens$1.5 / 1M tokens
Cache write$6.25 / 1M tokens$18.75 / 1M tokens

Summary

  • Opus 4.7leads in:Math and Reasoning (2/2), Coding and Software Engineer (1/1), General Knowledge (1/1)

On average across the 4 shared benchmarks, Opus 4.7 scores 20.72 higher.

Largest single-benchmark gap: FrontierMath — Opus 4.7 43.80 vs Opus 4.1 5.90 (+37.90).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.