GPT-5.4 Pro Benchmark Details

GPT-5.4 Pro currently shows benchmark results led by GPQA Diamond (2 / 180, score 94.40), HLE (3 / 163, score 58.70), BrowseComp (2 / 47, score 89.30). This page also compares it with 2 competitor models and 3 predecessor or same-series models, including performance and pricing views when available. 2 source links are attached for reference.

Benchmark Results

GPT-5.4 Pro

Benchmark Results

General Knowledge

5 evaluations

Benchmark / mode

Score

Rank/total

ARC-AGI

High

94.50

5 / 65

GPQA Diamond

High

94.40

2 / 180

ARC-AGI-2

High

83.30

6 / 59

HLE

High

42.70

47 / 163

HLE

HighTools

58.70

3 / 163

Math and Reasoning

5 evaluations

Benchmark / mode

Score

Rank/total

FrontierMath

High

3 / 60

FrontierMath

Extra-High

3 / 60

FrontierMath - Tier 4

Standard ModeToolsInternet

37.50

5 / 80

FrontierMath - Tier 4

High

4 / 80

FrontierMath - Tier 4

Extra-High

37.50

5 / 80

AI Agent - Information Search

1 evaluations

Benchmark / mode

Score

Rank/total

BrowseComp

HighTools

89.30

2 / 47

Productivity Knowledge

1 evaluations

Benchmark / mode

Score

Rank/total

GDPval-AA

HighTools

8 / 21

Compare with other models

Competitor Comparison

Benchmark scores for GPT-5.4 Pro compared against top models in its class

GPT-5.4 ProClaude Opus 4.6 Gemini 3.1 Pro Preview

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

8 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	GPT-5.4 ProCurrent	Claude Opus 4.6	Gemini 3.1 Pro Preview
ARC-AGI 综合评估	94.50Thinking Level · High	92.00Extended Thinking	--
ARC-AGI-2 综合评估	83.30Thinking Level · High	66.30Extended Thinking	77.10Thinking Level · High
GPQA Diamond 综合评估	94.40Thinking Level · High	91.31Extended Thinking	94.30Thinking Level · High
HLE 综合评估	58.70Thinking Level · High ｜ Tools	53.00Extended Thinking ｜ Tools	51.40Thinking Level · High ｜ Tools
FrontierMath 数学推理	50.00Thinking Level · Extra High	40.70Thinking Level · High	36.90Thinking Level · High
FrontierMath - Tier 4 数学推理	38.00Thinking Level · High	22.90Thinking Level · High	16.70Thinking Level · High
BrowseComp AI Agent - 信息收集	89.30Thinking Level · High ｜ Tools	84.00Thinking Enabled ｜ Tools	85.90Thinking Level · High ｜ Tools
GDPval-AA 生产力知识	82.00Thinking Level · High ｜ Tools	1606.00Extended Thinking ｜ Tools	--

Standard API Pricing: GPT-5.4 Pro vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

GPT-5.4 Pro: Base price applies to <= 272K

Claude Opus 4.6: Base price applies to <= 200K

Gemini 3.1 Pro Preview: Base price applies to <= 200K

Model	Supplier	Standard input	Standard output	Base price applies to
GPT-5.4 Pro	OpenAI	$30 / 1M tokens	$180 / 1M tokens	<= 272K
Claude Opus 4.6	Anthropic	$5 / 1M tokens	$25 / 1M tokens	<= 200K
Gemini 3.1 Pro Preview	Google Deep Mind	$2 / 1M tokens	$12 / 1M tokens	<= 200K

Version History

How each version of the GPT-5.4 Pro series stacks up on benchmark tests

GPT-5.4 ProGPT-5.2 Pro GPT-5.1 Pro GPT-5-Pro

Benchmark categories:

6 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

Benchmark	GPT-5.4 ProCurrent	GPT-5.2 Pro	GPT-5-Pro
ARC-AGI 综合评估	94.50Thinking Level · High	90.50Thinking Enabled	70.20Thinking Enabled
ARC-AGI-2 综合评估	83.30Thinking Level · High	54.20Thinking Enabled	18.00Thinking Enabled
GPQA Diamond 综合评估	94.40Thinking Level · High	93.20Thinking Enabled	89.40Thinking Enabled ｜ Tools
HLE 综合评估	58.70Thinking Level · High ｜ Tools	50.00Thinking Enabled ｜ Tools	42.00Thinking Enabled ｜ Tools
FrontierMath - Tier 4 数学推理	38.00Thinking Level · High	31.30Thinking Enabled	14.60Thinking Enabled
BrowseComp AI Agent - 信息收集	89.30Thinking Level · High ｜ Tools	77.90Thinking Enabled ｜ Tools	--

Single-Benchmark Version Trend

Viewing: ARC-AGI · 综合评估

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the GPT-5.4 Pro Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

GPT-5.4 Pro: Base price applies to <= 272K

Model	Supplier	Standard input	Standard output	Base price applies to
GPT-5.4 Pro	OpenAI	$30 / 1M tokens	$180 / 1M tokens	<= 272K

Sources

openai.comopenai.com epoch.aiepoch.ai