GPT-5.2 ProvsOpus 4.5

Across 6 shared benchmarks, GPT-5.2 Pro leads overall: GPT-5.2 Pro wins 5, Opus 4.5 wins 1, with 0 ties and an average score difference of +10.43.

OpenAI · 2025-12-11 · Reasoning model

Anthropic · 2025-11-25 · Reasoning model

GPT-5.2 Pro5 wins(83%)(17%)1 winOpus 4.5

Benchmark scores

Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.

GPT-5.2 Pro 4/4

Benchmark	GPT-5.2 Pro	Opus 4.5	Diff
ARC-AGI-2	54.2023 / 62	37.6029 / 62Extended (no tools)	+16.60
ARC-AGI	90.5017 / 68	8024 / 68Extended (no tools)	+10.50
HLE	5029 / 172	43.2049 / 172Extended (with tools)	+6.80
GPQA Diamond	93.209 / 187	8742 / 187Extended (no tools)	+6.20

Opus 4.5 1/1

Benchmark	GPT-5.2 Pro	Opus 4.5	Diff
Simple Bench	57.4019 / 63极高强度思考（无工具）	6212 / 63Extended (no tools)	-4.60

GPT-5.2 Pro 1/1

Benchmark	GPT-5.2 Pro	Opus 4.5	Diff
FrontierMath - Tier 4	31.309 / 80	4.2040 / 80Normal (No Tools)	+27.10

Prices use DataLearner records when available; missing fields are not inferred.

On average across the 6 shared benchmarks, GPT-5.2 Pro scores 10.43 higher.

Largest single-benchmark gap: FrontierMath - Tier 4 — GPT-5.2 Pro 31.30 vs Opus 4.5 4.20 (+27.10).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.