DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Page navigation
目录
Model catalogGPT-5.5Benchmark analysis

GPT-5.5 Benchmark Details

GPT-5.5 currently shows benchmark results led by ARC-AGI-2 (1 / 49, score 85), Terminal Bench 2.0 (1 / 37, score 82.70), FrontierMath (2 / 57, score 51.70). This page also compares it with 3 competitor models and 3 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

GPT-5.5

Benchmark Results

Thinking
All modesThinking
Thinking mode details (1)
All thinking modesDefault (Extra-High)
Tool usage
All modesWith toolsNo tools
Internet
All modesOfflineInternet enabled

综合评估

5 evaluations
Benchmark / mode
Score
Rank/total
ARC-AGI
High
95
2 / 56
GPQA Diamond
High
93.60
6 / 169
ARC-AGI-2
High
85
1 / 49
HLE
High
41.40
38 / 138
HLE
HighTools
52.20
10 / 138

数学推理

2 evaluations
Benchmark / mode
Score
Rank/total
FrontierMath
HighTools
51.70
2 / 57
FrontierMath - Tier 4
HighTools
35.40
4 / 38

编程与软件工程

1 evaluations
Benchmark / mode
Score
Rank/total
SWE-Bench Pro - Public
HighTools
58.60
3 / 30

Agent能力评测

1 evaluations
Benchmark / mode
Score
Rank/total
τ²-Bench - Telecom
HighTools
98
5 / 35

AI Agent - 信息收集

1 evaluations
Benchmark / mode
Score
Rank/total
BrowseComp
HighToolsInternet
84.40
5 / 39

AI Agent - 工具使用

2 evaluations
Benchmark / mode
Score
Rank/total
Terminal Bench 2.0
HighTools
82.70
1 / 37
OSWorld-Verified
HighTools
78.70
2 / 14

生产力知识

1 evaluations
Benchmark / mode
Score
Rank/total
GDPval-AA
High
84.90
3 / 18
Compare with other models

Competitor Comparison

Benchmark scores for GPT-5.5 compared against top models in its class

GPT-5.5Opus 4.7Claude Mythos PreviewGemini 3.1 Pro Preview
Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. See the table below for per-mode details.

Benchmark Score Comparison

10 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

BenchmarkGPT-5.5CurrentOpus 4.7Claude Mythos PreviewGemini 3.1 Pro Preview
ARC-AGI-2
综合评估
85.00Thinking Level · High
--
--
77.10Thinking Level · High
GPQA Diamond
综合评估
93.60Thinking Level · High
94.20Extended Thinking
94.60Extended Thinking
94.30Thinking Level · High
HLE
综合评估
52.20Thinking Level · High | Tools
54.70Extended Thinking | Tools
64.70Extended Thinking | Tools
51.40Thinking Level · High | Tools
FrontierMath
数学推理
51.70Thinking Level · High | Tools
43.80Thinking Level · Extra High
--
--
FrontierMath - Tier 4
数学推理
35.40Thinking Level · High | Tools
22.90Thinking Level · Extra High
--
--
SWE-Bench Pro - Public
编程与软件工程
58.60Thinking Level · High | Tools
64.30Extended Thinking | Tools
77.80Extended Thinking | Tools
54.20Thinking Level · High | Tools
τ²-Bench - Telecom
Agent能力评测
98.00Thinking Level · High | Tools
--
--
99.30Thinking Level · High | Tools
BrowseComp
AI Agent - 信息收集
84.40Thinking Level · High | Tools
79.30Extended Thinking | Tools
84.90Extended Thinking | Tools
85.90Thinking Level · High | Tools
OSWorld-Verified
AI Agent - 工具使用
78.70Thinking Level · High | Tools
78.00Extended Thinking | Tools
79.60Extended Thinking | Tools
--
Terminal Bench 2.0
AI Agent - 工具使用
82.70Thinking Level · High | Tools
69.40Extended Thinking | Tools
82.00Extended Thinking | Tools
68.50Thinking Level · High | Tools

Standard API Pricing: GPT-5.5 vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

Gemini 3.1 Pro Preview: Base price applies to <= 200K
ModelSupplierStandard inputStandard outputBase price applies to
GPT-5.5
OpenAI$5 / 1M tokens$30 / 1M tokens—
Opus 4.7
Anthropic$5 / 1M tokens$25 / 1M tokens—
Claude Mythos Preview
Anthropic$25 / 1M tokens$125 / 1M tokens—
Gemini 3.1 Pro Preview
Google Deep Mind$2 / 1M tokens$12 / 1M tokens<= 200K

Version History

How each version of the GPT-5.5 series stacks up on benchmark tests

GPT-5.5GPT-5.4GPT-5.2GPT-5.1
Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. See the table below for per-mode details.

Benchmark Score Comparison

12 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

BenchmarkGPT-5.5CurrentGPT-5.4GPT-5.2GPT-5.1
ARC-AGI
综合评估
95.00Thinking Level · High
93.70Thinking Level · Extra High
90.50Deep Thinking Mode
72.80Thinking Level · High
ARC-AGI-2
综合评估
85.00Thinking Level · High
77.10Standard Mode
54.20Deep Thinking Mode
17.60Thinking Level · High
GPQA Diamond
综合评估
93.60Thinking Level · High
92.80Thinking Level · Extra High
93.20Deep Thinking Mode
88.10Thinking Enabled
HLE
综合评估
52.20Thinking Level · High | Tools
52.10Thinking Level · Extra High | Tools
45.50Deep Thinking Mode | Tools
42.70Thinking Level · High | Tools
FrontierMath
数学推理
51.70Thinking Level · High | Tools
47.60Thinking Level · Extra High
40.30Thinking Level · Extra High | Tools
26.70Thinking Level · High | Tools
FrontierMath - Tier 4
数学推理
35.40Thinking Level · High | Tools
27.10Thinking Level · Extra High
14.60Thinking Level · Extra High | Tools
12.50Thinking Level · High
SWE-Bench Pro - Public
编程与软件工程
58.60Thinking Level · High | Tools
57.70Thinking Level · Extra High
55.60Thinking Level · Extra High | Tools
50.80Thinking Level · High
τ²-Bench - Telecom
Agent能力评测
98.00Thinking Level · High | Tools
98.90Thinking Level · Extra High | Tools
98.70Thinking Level · Extra High | Tools
95.60Thinking Level · High | Tools
BrowseComp
AI Agent - 信息收集
84.40Thinking Level · High | Tools
82.70Thinking Level · Extra High | Tools
65.80Thinking Level · Extra High | Tools
50.80Thinking Level · High
OSWorld-Verified
AI Agent - 工具使用
78.70Thinking Level · High | Tools
75.00Thinking Level · Extra High | Tools
--
--
Terminal Bench 2.0
AI Agent - 工具使用
82.70Thinking Level · High | Tools
75.10Thinking Level · Extra High | Tools
--
47.60Thinking Level · High | Tools
GDPval-AA
生产力知识
84.90Thinking Level · High
--
70.90Thinking Level · High | Tools
--

Single-Benchmark Version Trend

Viewing: ARC-AGI · 综合评估

Benchmark
NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the GPT-5.5 Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

GPT-5.4: Base price applies to <= 272K
ModelSupplierStandard inputStandard outputBase price applies to
GPT-5.5
OpenAI$5 / 1M tokens$30 / 1M tokens—
GPT-5.4
OpenAI$2.5 / 1M tokens$15 / 1M tokens<= 272K
GPT-5.2
Facebook AI研究实验室$1.75 / 1M tokens$14 / 1M tokens—
GPT-5.1
—1.25 美元/100万 tokens10 美元/100万 tokens—

Sources

openai.com