Opus 4.7vsGPT-5.4

Across 11 shared benchmarks, GPT-5.4 leads overall: Opus 4.7 wins 4, GPT-5.4 wins 6, with 1 ties and an average score difference of -0.45.

Anthropic
Opus 4.7

Anthropic · 2026-04-16 · Reasoning model

OpenAI
GPT-5.4

OpenAI · 2026-03-05 · Multimodal model

Opus 4.74 wins(36%)Ties1(55%)6 winsGPT-5.4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 11 shared benchmarks.

General Knowledge

Even 5/5
BenchmarkOpus 4.7GPT-5.4Diff
HLE54.708 / 157Extended (with tools)52.1014 / 157极高强度思考(工具)+2.60
GPQA Diamond94.204 / 178Extended (no tools)92.8010 / 178极高强度思考(无工具)+1.40
ARC-AGI-275.809 / 59最高(无工具)77.107 / 59Normal (No Tools)-1.30
ARC-AGI93.509 / 65Thinking High (No Tools)93.707 / 65Normal (No Tools)-0.20
ARC-AGI-305 / 6Thinking High (No Tools)04 / 6Thinking High (No Tools)

AI Agent - Tool Usage

Even 2/2
BenchmarkOpus 4.7GPT-5.4Diff
Terminal Bench 2.069.406 / 46Extended (with tools)75.104 / 46极高强度思考(工具)-5.70
OSWorld-Verified786 / 18Extended (with tools)757 / 18极高强度思考(工具)+3

Math and Reasoning

GPT-5.4 2/2
BenchmarkOpus 4.7GPT-5.4Diff
FrontierMath - Tier 422.9012 / 80极高强度思考(无工具)27.1011 / 80极高强度思考(无工具)-4.20
FrontierMath43.806 / 60极高强度思考(无工具)47.605 / 60极高强度思考(无工具)-3.80

AI Agent - Information Search

GPT-5.4 1/1
BenchmarkOpus 4.7GPT-5.4Diff
BrowseComp79.3013 / 45Extended (with tools)82.7011 / 45极高强度思考(工具)-3.40

Coding and Software Engineer

Opus 4.7 1/1
BenchmarkOpus 4.7GPT-5.4Diff
SWE-Bench Pro - Public64.304 / 43Extended (with tools)57.7010 / 43极高强度思考(无工具)+6.60

Specs

FieldOpus 4.7GPT-5.4
PublisherAnthropicOpenAI
Release date2026-04-162026-03-05
Model typeReasoning modelMultimodal model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1000K1M
Max output128K125K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemOpus 4.7GPT-5.4
Text input$5 / 1M tokens$2.5 / 1M tokens
Text output$25 / 1M tokens$15 / 1M tokens
Cache read$0.5 / 1M tokensNot public
Cache write$6.25 / 1M tokens$0.25 / 1M tokens

Summary

  • Opus 4.7leads in:Coding and Software Engineer (1/1)
  • GPT-5.4leads in:Math and Reasoning (2/2), AI Agent - Information Search (1/1)
  • Tied in:General Knowledge, AI Agent - Tool Usage

On average across the 11 shared benchmarks, GPT-5.4 scores 0.45 higher.

Largest single-benchmark gap: SWE-Bench Pro - Public — Opus 4.7 64.30 vs GPT-5.4 57.70 (+6.60).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.