GPT-5.1vsClaude Opus 4

Across 9 shared benchmarks, GPT-5.1 leads overall: GPT-5.1 wins 8, Claude Opus 4 wins 1, with 0 ties and an average score difference of +15.33.

OpenAI · 2025-11-12 · Reasoning model

Anthropic · 2025-05-23 · Reasoning model

GPT-5.18 wins(89%)(11%)1 winClaude Opus 4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 9 shared benchmarks.

GPT-5.1 4/4

Benchmark	GPT-5.1	Claude Opus 4	Diff
ARC-AGI	72.8025 / 65high	35.7048 / 65	+37.10
HLE	42.7038 / 149Thinking High (With Tools + Internet)	10.70121 / 149	+32
ARC-AGI-2	17.6032 / 58high	8.6038 / 58	+9

Prices use DataLearner records when available; missing fields are not inferred.

GPT-5.1leads in:General Knowledge (4/4), Math and Reasoning (3/4), Coding and Software Engineer (1/1)

On average across the 9 shared benchmarks, GPT-5.1 scores 15.33 higher.

Largest single-benchmark gap: ARC-AGI — GPT-5.1 72.80 vs Claude Opus 4 35.70 (+37.10).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

Benchmark	GPT-5.1	Claude Opus 4	Diff
FrontierMath	26.7013 / 60Thinking High (With Tools)	4.5039 / 60	+22.20
AIME2025	9428 / 106Thinking High (No Tools)	75.5065 / 106	+18.50
FrontierMath - Tier 4	12.5029 / 80Thinking High (With Tools)	072 / 80Normal (No Tools)	+12.50
Simple Bench	53.2010 / 27high	58.807 / 27thinking	-5.60