GPT-5.2vsOpus 4.5
Across 10 shared benchmarks, GPT-5.2 leads overall: GPT-5.2 wins 9, Opus 4.5 wins 1, with 0 ties and an average score difference of +8.21.
GPT-5.29 wins(90%)(10%)1 winOpus 4.5
Benchmark scores
Grouped by capability, sorted by largest gap within each. 10 shared benchmarks.
General Knowledge
GPT-5.2 4/4| Benchmark | GPT-5.2 | Opus 4.5 | Diff |
|---|---|---|---|
| ARC-AGI-2 | 54.2020 / 59深度思考(无工具、并行) | 37.6026 / 59Extended (no tools) | +16.60 |
| ARC-AGI | 90.5015 / 65深度思考(无工具、并行) | 8021 / 65Extended (no tools) | +10.50 |
| GPQA Diamond | 93.208 / 178深度思考(无工具、并行) | 8738 / 178Extended (no tools) | +6.20 |
| HLE | 45.5032 / 157Deep Thinking (With Tools + Internet) | 43.2039 / 157Extended (with tools) | +2.30 |
Agent Level Benchmark
GPT-5.2 2/2| Benchmark | GPT-5.2 | Opus 4.5 | Diff |
|---|---|---|---|
| τ²-Bench - Telecom | 98.704 / 35极高强度思考(工具) | 90.7021 / 35Extended (with tools) | +8 |
| τ²-Bench | 8212 / 40极高强度思考(工具) | 81.9913 / 40Extended (with tools) | +0.01 |
Math and Reasoning
GPT-5.2 2/2| Benchmark | GPT-5.2 | Opus 4.5 | Diff |
|---|---|---|---|
| FrontierMath | 40.308 / 60极高强度思考(工具) | 20.7017 / 60Extended (no tools) | +19.60 |
| FrontierMath - Tier 4 | 18.8016 / 80Thinking High (No Tools) | 4.2040 / 80Normal (No Tools) | +14.60 |
Coding and Software Engineer
Opus 4.5 1/1| Benchmark | GPT-5.2 | Opus 4.5 | Diff |
|---|---|---|---|
| SWE-bench Verified | 8016 / 108极高强度思考(工具) | 80.908 / 108Extended (with tools) | -0.90 |
Multimodal Understanding
GPT-5.2 1/1| Benchmark | GPT-5.2 | Opus 4.5 | Diff |
|---|---|---|---|
| MMMU | 85.901 / 28极高强度思考(无工具) | 80.7010 / 28Extended (no tools) | +5.20 |
Specs
| Field | GPT-5.2 | Opus 4.5 |
|---|---|---|
| Publisher | OpenAI | Anthropic |
| Release date | 2025-12-11 | 2025-11-25 |
| Model type | Chat model | Reasoning model |
| Architecture | Dense | Dense |
| Parameters | Not available | Not available |
| Context length | 400K | 200K |
| Max output | Not available | 64K |
API pricing
Prices use DataLearner records when available; missing fields are not inferred.
| Item | GPT-5.2 | Opus 4.5 |
|---|---|---|
| Text input | $1.75 / 1M tokens | $5 / 1M tokens |
| Text output | $14 / 1M tokens | $25 / 1M tokens |
| Cache read | $0.175 / 1M tokens | $0.5 / 1M tokens |
| Cache write | $1.75 / 1M tokens | $6.25 / 1M tokens |
Summary
- GPT-5.2leads in:General Knowledge (4/4), Agent Level Benchmark (2/2), Math and Reasoning (2/2), Multimodal Understanding (1/1)
- Opus 4.5leads in:Coding and Software Engineer (1/1)
On average across the 10 shared benchmarks, GPT-5.2 scores 8.21 higher.
Largest single-benchmark gap: FrontierMath — GPT-5.2 40.30 vs Opus 4.5 20.70 (+19.60).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.