GPT-5vsClaude Opus 4

Across 11 shared benchmarks, GPT-5 leads overall: GPT-5 wins 10, Claude Opus 4 wins 1, with 0 ties and an average score difference of +16.18.

OpenAI
GPT-5

OpenAI · 2025-08-07 · Foundation model

Anthropic
Claude Opus 4

Anthropic · 2025-05-23 · Reasoning model

GPT-510 wins(91%)(9%)1 winClaude Opus 4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 11 shared benchmarks.

Math and Reasoning

GPT-5 4/5
BenchmarkGPT-5Claude Opus 4Diff
IMO-ProofBench592 / 162.9016 / 16+56.10
AIME202599.609 / 10675.5065 / 106+24.10
FrontierMath24.8015 / 604.5039 / 60+20.30
FrontierMath - Tier 412.5029 / 80Thinking High (No Tools)4.2040 / 80+8.30
Simple Bench56.708 / 2758.807 / 27-2.10

General Knowledge

GPT-5 4/4
BenchmarkGPT-5Claude Opus 4Diff
ARC-AGI65.7030 / 6535.7048 / 65+30
HLE35.2060 / 15710.70129 / 157+24.50
GPQA Diamond87.3037 / 17879.6079 / 178+7.70
ARC-AGI-29.9037 / 598.6039 / 59+1.30

Agent Level Benchmark

GPT-5 1/1
BenchmarkGPT-5Claude Opus 4Diff
τ²-Bench8015 / 4072.5022 / 40+7.50

Coding and Software Engineer

GPT-5 1/1
BenchmarkGPT-5Claude Opus 4Diff
SWE-bench Verified72.8046 / 10872.5048 / 108+0.30

Specs

FieldGPT-5Claude Opus 4
PublisherOpenAIAnthropic
Release date2025-08-072025-05-23
Model typeFoundation modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length400K200K
Max output128K32K

Summary

  • GPT-5leads in:Math and Reasoning (4/5), General Knowledge (4/4), Agent Level Benchmark (1/1), Coding and Software Engineer (1/1)

On average across the 11 shared benchmarks, GPT-5 scores 16.18 higher.

Largest single-benchmark gap: IMO-ProofBench — GPT-5 59 vs Claude Opus 4 2.90 (+56.10).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.