GPT-5.2vsOpus 4.5

Across 10 shared benchmarks, GPT-5.2 leads overall: GPT-5.2 wins 9, Opus 4.5 wins 1, with 0 ties and an average score difference of +8.21.

OpenAI
GPT-5.2

OpenAI · 2025-12-11 · Chat model

Anthropic
Opus 4.5

Anthropic · 2025-11-25 · Reasoning model

GPT-5.29 wins(90%)(10%)1 winOpus 4.5

Benchmark scores

Grouped by capability, sorted by largest gap within each. 10 shared benchmarks.

General Knowledge

GPT-5.2 4/4
BenchmarkGPT-5.2Opus 4.5Diff
ARC-AGI-254.2020 / 59深度思考(无工具、并行)37.6026 / 59Extended (no tools)+16.60
ARC-AGI90.5015 / 65深度思考(无工具、并行)8021 / 65Extended (no tools)+10.50
GPQA Diamond93.208 / 178深度思考(无工具、并行)8738 / 178Extended (no tools)+6.20
HLE45.5032 / 157Deep Thinking (With Tools + Internet)43.2039 / 157Extended (with tools)+2.30

Agent Level Benchmark

GPT-5.2 2/2
BenchmarkGPT-5.2Opus 4.5Diff
τ²-Bench - Telecom98.704 / 35极高强度思考(工具)90.7021 / 35Extended (with tools)+8
τ²-Bench8212 / 40极高强度思考(工具)81.9913 / 40Extended (with tools)+0.01

Math and Reasoning

GPT-5.2 2/2
BenchmarkGPT-5.2Opus 4.5Diff
FrontierMath40.308 / 60极高强度思考(工具)20.7017 / 60Extended (no tools)+19.60
FrontierMath - Tier 418.8016 / 80Thinking High (No Tools)4.2040 / 80Normal (No Tools)+14.60

Coding and Software Engineer

Opus 4.5 1/1
BenchmarkGPT-5.2Opus 4.5Diff
SWE-bench Verified8016 / 108极高强度思考(工具)80.908 / 108Extended (with tools)-0.90

Multimodal Understanding

GPT-5.2 1/1
BenchmarkGPT-5.2Opus 4.5Diff
MMMU85.901 / 28极高强度思考(无工具)80.7010 / 28Extended (no tools)+5.20

Specs

FieldGPT-5.2Opus 4.5
PublisherOpenAIAnthropic
Release date2025-12-112025-11-25
Model typeChat modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length400K200K
Max outputNot available64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemGPT-5.2Opus 4.5
Text input$1.75 / 1M tokens$5 / 1M tokens
Text output$14 / 1M tokens$25 / 1M tokens
Cache read$0.175 / 1M tokens$0.5 / 1M tokens
Cache write$1.75 / 1M tokens$6.25 / 1M tokens

Summary

  • GPT-5.2leads in:General Knowledge (4/4), Agent Level Benchmark (2/2), Math and Reasoning (2/2), Multimodal Understanding (1/1)
  • Opus 4.5leads in:Coding and Software Engineer (1/1)

On average across the 10 shared benchmarks, GPT-5.2 scores 8.21 higher.

Largest single-benchmark gap: FrontierMath — GPT-5.2 40.30 vs Opus 4.5 20.70 (+19.60).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.