GPT-5.2 Pro Benchmark Details

GPT-5.2 Pro currently shows benchmark results led by GPQA Diamond (8 / 179, score 93.20), FrontierMath - Tier 4 (9 / 80, score 31.30), HLE (23 / 159, score 50). This page also compares it with 2 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

GPT-5.2 Pro

Benchmark Results

General Knowledge

5 evaluations

Benchmark / mode

Score

Rank/total

GPQA Diamond

93.20

8 / 179

ARC-AGI

90.50

15 / 65

ARC-AGI-2

54.20

20 / 59

HLE

23 / 159

HLE

36.60

61 / 159

常识推理

1 evaluations

Benchmark / mode

Score

Rank/total

Simple Bench

Extra-High

57.40

19 / 63

Math and Reasoning

2 evaluations

Benchmark / mode

Score

Rank/total

FrontierMath - Tier 4

Standard ModeToolsInternet

31.30

9 / 80

FrontierMath - Tier 4

31.30

9 / 80

AI Agent - Information Search

2 evaluations

Benchmark / mode

Score

Rank/total

BrowseComp

77.90

16 / 45

BrowseComp

Extra-HighTools

77.90

16 / 45

Compare with other models

Competitor Comparison

Benchmark scores for GPT-5.2 Pro compared against top models in its class

GPT-5.2 ProOpus 4.5 Gemini 3 Deep Think

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

3 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	GPT-5.2 ProCurrent	Opus 4.5
HLE 综合评估	50.00Thinking Enabled ｜ Tools	43.20Extended Thinking ｜ Tools
Simple Bench 常识推理	57.40Thinking Level · Extra High	62.00Extended Thinking
FrontierMath - Tier 4 数学推理	31.30Thinking Enabled	4.20Standard Mode

Standard API Pricing: GPT-5.2 Pro vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
Opus 4.5	Facebook AI研究实验室	$5 / 1M tokens	$25 / 1M tokens	—

Version History

How each version of the GPT-5.2 Pro series stacks up on benchmark tests

GPT-5.2 ProGPT-5.1 Pro GPT-5-Pro

Benchmark categories:

6 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

Benchmark	GPT-5.2 ProCurrent	GPT-5-Pro
ARC-AGI 综合评估	90.50Thinking Enabled	70.20Thinking Enabled
ARC-AGI-2 综合评估	54.20Thinking Enabled	18.00Thinking Enabled
GPQA Diamond 综合评估	93.20Thinking Enabled	89.40Thinking Enabled ｜ Tools
HLE 综合评估	50.00Thinking Enabled ｜ Tools	42.00Thinking Enabled ｜ Tools
Simple Bench 常识推理	57.40Thinking Level · Extra High	61.60Thinking Enabled
FrontierMath - Tier 4 数学推理	31.30Thinking Enabled	14.60Thinking Enabled

Single-Benchmark Version Trend

Viewing: ARC-AGI · 综合评估

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the GPT-5.2 Pro Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier.

Comparable standard text pricing is not available for these models.

Sources

openai.comopenai.com