GPT-5.5 Pro Benchmark Details

GPT-5.5 Pro currently shows benchmark results led by FrontierMath - Tier 4 (1 / 80, score 39.60), FrontierMath (1 / 60, score 52.40), ARC-AGI (2 / 68, score 96.50). This page also compares it with 1 competitor models and 3 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

GPT-5.5 Pro

Benchmark Results

General Knowledge

6 evaluations

Benchmark / mode

Score

Rank/total

ARC-AGI

High

96.50

2 / 68

ARC-AGI

Extra-High

5 / 68

ARC-AGI-2

High

84.60

4 / 62

ARC-AGI-2

Extra-High

84.20

6 / 62

HLE

Extra-High

43.10

51 / 172

HLE

Extra-HighTools

57.20

9 / 172

Common Sense Reasoning

1 evaluations

Benchmark / mode

Score

Rank/total

Simple Bench

Standard Mode

76.90

3 / 63

Math and Reasoning

4 evaluations

Benchmark / mode

Score

Rank/total

FrontierMath

Extra-HighTools

52.40

1 / 60

FrontierMath - Tier 4

High

39.60

1 / 80

FrontierMath - Tier 4

Extra-High

39.60

1 / 80

FrontierMath - Tier 4

Extra-HighTools

39.60

1 / 80

AI Agent - Information Search

1 evaluations

Benchmark / mode

Score

Rank/total

BrowseComp

Extra-HighToolsInternet

90.10

3 / 53

Productivity Knowledge

1 evaluations

Benchmark / mode

Score

Rank/total

GDPval-AA

Extra-High

82.30

7 / 21

Compare with other models

Competitor Comparison

Benchmark scores for GPT-5.5 Pro compared against top models in its class

GPT-5.5 ProClaude Mythos Preview

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

2 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	GPT-5.5 ProCurrent	Claude Mythos Preview
HLE 综合评估	57.20Thinking Level · Extra High ｜ Tools	64.70Extended Thinking ｜ Tools
BrowseComp AI Agent - 信息收集	90.10Deep Thinking Mode ｜ Tools	84.90Extended Thinking ｜ Tools

Standard API Pricing: GPT-5.5 Pro vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
GPT-5.5 Pro	OpenAI	$30 / 1M tokens	$180 / 1M tokens	—
Claude Mythos Preview	Anthropic	$25 / 1M tokens	$125 / 1M tokens	—

Version History

How each version of the GPT-5.5 Pro series stacks up on benchmark tests

GPT-5.5 ProGPT-5.4 Pro GPT-5.2 Pro GPT-5.1 Pro

Benchmark categories:

8 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

Benchmark	GPT-5.5 ProCurrent	GPT-5.4 Pro	GPT-5.2 Pro
ARC-AGI 综合评估	96.50Thinking Level · High	94.50Thinking Level · High	90.50Thinking Enabled
ARC-AGI-2 综合评估	84.60Thinking Level · High	83.30Thinking Level · High	54.20Thinking Enabled
HLE 综合评估	57.20Thinking Level · Extra High ｜ Tools	58.70Thinking Level · High ｜ Tools	50.00Thinking Enabled ｜ Tools
Simple Bench 常识推理	76.90Standard Mode	--	57.40Thinking Level · Extra High
FrontierMath 数学推理	52.40Thinking Level · Extra High ｜ Tools	50.00Thinking Level · Extra High	--
FrontierMath - Tier 4 数学推理	39.60Thinking Level · Extra High ｜ Tools	38.00Thinking Level · High	31.30Thinking Enabled
BrowseComp AI Agent - 信息收集	90.10Deep Thinking Mode ｜ Tools	89.30Thinking Level · High ｜ Tools	77.90Thinking Enabled ｜ Tools
GDPval-AA 生产力知识	82.30Thinking Level · Extra High	82.00Thinking Level · High ｜ Tools	--

Single-Benchmark Version Trend

Viewing: ARC-AGI · 综合评估

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the GPT-5.5 Pro Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

GPT-5.4 Pro: Base price applies to <= 272K

Model	Supplier	Standard input	Standard output	Base price applies to
GPT-5.5 Pro	OpenAI	$30 / 1M tokens	$180 / 1M tokens	—
GPT-5.4 Pro	OpenAI	$30 / 1M tokens	$180 / 1M tokens	<= 272K
GPT-5.2 Pro	OpenAI	$21 / 1M tokens	$168 / 1M tokens	—
GPT-5.1 Pro	OpenAI	$15 / 1M tokens	$120 / 1M tokens	—

Sources

arcprize.orgarcprize.org