GPT-5.2 Pro Benchmark Details

GPT-5.2 Pro currently shows benchmark results led by GPQA Diamond (8 / 179, score 93.20), FrontierMath - Tier 4 (9 / 80, score 31.30), HLE (23 / 159, score 50). This page also compares it with 2 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

GPT-5.2 Pro

Benchmark Results

Thinking
Tool usage
Internet

General Knowledge

5 evaluations
Benchmark / mode
Score
Rank/total
93.20
8 / 179
90.50
15 / 65
54.20
20 / 59
50
23 / 159
36.60
61 / 159

常识推理

1 evaluations
Benchmark / mode
Score
Rank/total
Simple Bench
Extra-High
57.40
19 / 63

Math and Reasoning

2 evaluations
Benchmark / mode
Score
Rank/total
FrontierMath - Tier 4
Standard ModeToolsInternet
31.30
9 / 80

AI Agent - Information Search

2 evaluations
Benchmark / mode
Score
Rank/total
77.90
16 / 45
BrowseComp
Extra-HighTools
77.90
16 / 45

Competitor Comparison

Benchmark scores for GPT-5.2 Pro compared against top models in its class

Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

3 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

BenchmarkGPT-5.2 ProCurrentOpus 4.5
HLE
综合评估
50.00Thinking Enabled | Tools
43.20Extended Thinking | Tools
Simple Bench
常识推理
57.40Thinking Level · Extra High
62.00Extended Thinking
31.30Thinking Enabled
4.20Standard Mode

Standard API Pricing: GPT-5.2 Pro vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

ModelSupplierStandard inputStandard outputBase price applies to
Opus 4.5
Facebook AI研究实验室$5 / 1M tokens$25 / 1M tokens

Version History

How each version of the GPT-5.2 Pro series stacks up on benchmark tests

Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

6 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

BenchmarkGPT-5.2 ProCurrentGPT-5-Pro
ARC-AGI
综合评估
90.50Thinking Enabled
70.20Thinking Enabled
ARC-AGI-2
综合评估
54.20Thinking Enabled
18.00Thinking Enabled
GPQA Diamond
综合评估
93.20Thinking Enabled
89.40Thinking Enabled | Tools
HLE
综合评估
50.00Thinking Enabled | Tools
42.00Thinking Enabled | Tools
Simple Bench
常识推理
57.40Thinking Level · Extra High
61.60Thinking Enabled
31.30Thinking Enabled
14.60Thinking Enabled

Single-Benchmark Version Trend

Viewing: ARC-AGI · 综合评估

Benchmark
NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the GPT-5.2 Pro Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier.

Comparable standard text pricing is not available for these models.

Sources