GPT-5.2 Benchmark Details
GPT-5.2 currently shows benchmark results led by AIME2025 (1 / 106, score 100), MMMU (1 / 28, score 85.90), GPQA Diamond (7 / 175, score 93.20). This page also compares it with 2 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 2 source links are attached for reference.
Benchmark Results
Benchmark Results
综合评估
6 evaluations编程与软件工程
3 evaluations数学推理
3 evaluationsAI Agent - 信息收集
2 evaluationsCompetitor Comparison
Benchmark scores for GPT-5.2 compared against top models in its class
Benchmark Score Comparison
12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.
| Benchmark | GPT-5.2Current | Gemini 3.0 Pro (Preview 11-2025) | Opus 4.5 |
|---|---|---|---|
ARC-AGI 综合评估 | 90.50Deep Thinking Mode | 87.50Thinking Enabled | 80.00Extended Thinking |
ARC-AGI-2 综合评估 | 54.20Deep Thinking Mode | 45.10Thinking Enabled | 37.60Extended Thinking |
GPQA Diamond 综合评估 | 93.20Deep Thinking Mode | 93.80Thinking Enabled | 87.00Extended Thinking |
HLE 综合评估 | 45.50Deep Thinking Mode | Tools | 45.80Thinking Level · High | Tools | 43.20Extended Thinking | Tools |
SWE-bench Verified 编程与软件工程 | 80.00Thinking Level · Extra High | Tools | 76.20Thinking Enabled | 80.90Extended Thinking | Tools |
AIME2025 数学推理 | 100.00Thinking Level · Extra High | 95.00Thinking Enabled | -- |
FrontierMath 数学推理 | 40.30Thinking Level · Extra High | Tools | 38.00Thinking Enabled | 20.70Extended Thinking |
14.60Thinking Level · Extra High | Tools | 18.80Thinking Enabled | 4.20Standard Mode | |
MMMU 多模态理解 | 85.90Thinking Level · Extra High | -- | 80.70Extended Thinking |
τ²-Bench Agent能力评测 | 82.00Thinking Level · Extra High | Tools | 85.40Thinking Enabled | Tools | 81.99Extended Thinking | Tools |
τ²-Bench - Telecom Agent能力评测 | 98.70Thinking Level · Extra High | Tools | 98.00Thinking Level · High | Tools | 90.70Extended Thinking | Tools |
BrowseComp AI Agent - 信息收集 | 65.80Thinking Level · Extra High | Tools | 59.20Thinking Level · High | Tools | -- |
Standard API Pricing: GPT-5.2 vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
When a context threshold exists, the charted base price only applies within these limits:
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.2 | Facebook AI研究实验室 | $1.75 / 1M tokens | $14 / 1M tokens | — |
Gemini 3.0 Pro (Preview 11-2025) | — | 2 美元/100万 tokens | 12 美元/100万 tokens | <= 200K |
Opus 4.5 | Facebook AI研究实验室 | $5 / 1M tokens | $25 / 1M tokens | — |
Version History
How each version of the GPT-5.2 series stacks up on benchmark tests
Benchmark Score Comparison
12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.
| Benchmark | GPT-5.2Current | GPT-5.1 | GPT-5 |
|---|---|---|---|
ARC-AGI 综合评估 | 90.50Deep Thinking Mode | 72.80Thinking Level · High | 65.70Thinking Level · High |
ARC-AGI-2 综合评估 | 54.20Deep Thinking Mode | 17.60Thinking Level · High | 9.90Thinking Level · High |
GPQA Diamond 综合评估 | 93.20Deep Thinking Mode | 88.10Thinking Enabled | 87.30Thinking Enabled | Tools |
HLE 综合评估 | 45.50Deep Thinking Mode | Tools | 42.70Thinking Level · High | Tools | 35.20Thinking Enabled | Tools |
IC SWE-Lancer(Diamond) 编程与软件工程 | 74.60Thinking Level · Extra High | Tools | 69.70Thinking Level · High | -- |
SWE-Bench Pro - Public 编程与软件工程 | 55.60Thinking Level · Extra High | Tools | 50.80Thinking Level · High | 36.30Thinking Level · High |
SWE-bench Verified 编程与软件工程 | 80.00Thinking Level · Extra High | Tools | 76.30Thinking Level · High | 72.80Thinking Level · High |
AIME2025 数学推理 | 100.00Thinking Level · Extra High | 94.00Thinking Level · High | 99.60Thinking Enabled | Tools |
FrontierMath 数学推理 | 40.30Thinking Level · Extra High | Tools | 26.70Thinking Level · High | Tools | 26.30Thinking Level · High | Tools |
14.60Thinking Level · Extra High | Tools | 12.50Thinking Level · High | 12.50Thinking Level · High | |
MMMU 多模态理解 | 85.90Thinking Level · Extra High | 85.40Thinking Level · High | 84.20Thinking Level · High |
τ²-Bench Agent能力评测 | 82.00Thinking Level · Extra High | Tools | -- | 80.00Thinking Enabled | Tools |
Single-Benchmark Version Trend
Viewing: ARC-AGI · 综合评估
Standard API Pricing Across the GPT-5.2 Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.2 | Facebook AI研究实验室 | $1.75 / 1M tokens | $14 / 1M tokens | — |
GPT-5.1 | — | 1.25 美元/100万 tokens | 10 美元/100万 tokens | — |
GPT-5 | — | 1.25 美元/100 万tokens | 10 美元/100 万tokens | — |