GPT-5vsClaude Opus 4

Across 12 shared benchmarks, GPT-5 leads overall: GPT-5 wins 11, Claude Opus 4 wins 1, with 0 ties and an average score difference of +16.28.

OpenAI · 2025-08-07 · Foundation model

Anthropic · 2025-05-23 · Reasoning model

GPT-511 wins(92%)(8%)1 winClaude Opus 4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 12 shared benchmarks.

GPT-5 4/4

Benchmark	GPT-5	Claude Opus 4	Diff
ARC-AGI	65.7033 / 68	35.7051 / 68	+30
HLE	35.2073 / 172	10.70144 / 172	+24.50
GPQA Diamond	87.3040 / 187	79.6085 / 187	+7.70
ARC-AGI-2	9.9040 / 62	8.6042 / 62	+1.30

GPT-5 4/4

Benchmark	GPT-5	Claude Opus 4	Diff
IMO-ProofBench	592 / 16	2.9016 / 16	+56.10
AIME2025	99.609 / 107	75.5066 / 107	+24.10
FrontierMath	24.8015 / 60	4.5039 / 60	+20.30
FrontierMath - Tier 4	12.5029 / 80Thinking High (No Tools)	4.2040 / 80	+8.30

GPT-5 2/2

Benchmark	GPT-5	Claude Opus 4	Diff
Aider-Polyglot	881 / 59Thinking High (No Tools)	70.7016 / 59Normal (No Tools)	+17.30
τ²-Bench	8015 / 43	72.5023 / 43	+7.50

GPT-5 1/1

Benchmark	GPT-5	Claude Opus 4	Diff
SWE-bench Verified	72.8050 / 112	72.5052 / 112	+0.30

Claude Opus 4 1/1

Benchmark	GPT-5	Claude Opus 4	Diff
Simple Bench	56.7020 / 63Thinking High (No Tools)	58.8017 / 63Thinking (No Tools)	-2.10

Prices use DataLearner records when available; missing fields are not inferred.

GPT-5leads in:General Knowledge (4/4), Math and Reasoning (4/4), Agent Level Benchmark (2/2), Coding and Software Engineer (1/1)
Claude Opus 4leads in:Commonsense Reasoning (1/1)

On average across the 12 shared benchmarks, GPT-5 scores 16.28 higher.

Largest single-benchmark gap: IMO-ProofBench — GPT-5 59 vs Claude Opus 4 2.90 (+56.10).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.