Grouped by capability, sorted by largest gap within each. 10 shared benchmarks.
| Benchmark | GPT-5.2 | Opus 4.5 | Diff |
|---|---|---|---|
| ARC-AGI-2 | 54.2019 / 58深度思考(无工具、并行) | 37.6025 / 58Extended (no tools) | +16.60 |
| ARC-AGI | 90.5015 / 65深度思考(无工具、并行) | 8021 / 65Extended (no tools) | +10.50 |
| GPQA Diamond | 93.207 / 175深度思考(无工具、并行) | 8735 / 175Extended (no tools) | +6.20 |
| HLE | 45.5027 / 149Deep Thinking (With Tools + Internet) | 43.2034 / 149Extended (with tools) | +2.30 |
| Benchmark | GPT-5.2 | Opus 4.5 | Diff |
|---|---|---|---|
| τ²-Bench - Telecom | 98.704 / 35极高强度思考(工具) | 90.7021 / 35Extended (with tools) | +8 |
| τ²-Bench | 8212 / 40极高强度思考(工具) | 81.9913 / 40Extended (with tools) | +0.01 |
| Benchmark | GPT-5.2 | Opus 4.5 | Diff |
|---|---|---|---|
| FrontierMath | 40.308 / 60极高强度思考(工具) | 20.7017 / 60Extended (no tools) | +19.60 |
| FrontierMath - Tier 4 | 18.8016 / 80Thinking High (No Tools) | 4.2040 / 80Normal (No Tools) | +14.60 |
| Benchmark | GPT-5.2 | Opus 4.5 | Diff |
|---|---|---|---|
| SWE-bench Verified | 8012 / 103极高强度思考(工具) | 80.905 / 103Extended (with tools) | -0.90 |
| Benchmark | GPT-5.2 | Opus 4.5 | Diff |
|---|---|---|---|
| MMMU | 85.901 / 28极高强度思考(无工具) | 80.7010 / 28Extended (no tools) | +5.20 |
| Field | GPT-5.2 | Opus 4.5 |
|---|---|---|
| Publisher | OpenAI | Anthropic |
| Release date | 2025-12-11 | 2025-11-25 |
| Model type | AI model | Reasoning model |
| Architecture | Dense | Dense |
| Parameters | 0.0 | 0.0 |
| Context length | 400K | 200K |
| Max output | Not available | 65536 |
Prices use DataLearner records when available; missing fields are not inferred.
| Item | GPT-5.2 | Opus 4.5 |
|---|---|---|
| Text input | $1.75 / 1M tokens | $5 / 1M tokens |
| Text output | $14 / 1M tokens | $25 / 1M tokens |
| Cache read | $0.175 / 1M tokens | $0.5 / 1M tokens |
| Cache write | $1.75 / 1M tokens | $6.25 / 1M tokens |
On average across the 10 shared benchmarks, GPT-5.2 scores 8.21 higher.
Largest single-benchmark gap: FrontierMath — GPT-5.2 40.30 vs Opus 4.5 20.70 (+19.60).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.