GPT-5.5vsGPT-5.4

Across 15 shared benchmarks, GPT-5.5 leads overall: GPT-5.5 wins 14, GPT-5.4 wins 0, with 1 ties and an average score difference of +6.02.

OpenAI · 2026-04-23 · Reasoning model

OpenAI · 2026-03-05 · Multimodal model

GPT-5.514 wins(93%)Ties1(0%)0 winsGPT-5.4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 15 shared benchmarks.

GPT-5.5 5/6

Benchmark	GPT-5.5	GPT-5.4	Diff
ARC-AGI-2	852 / 62Thinking High (No Tools)	77.109 / 62Normal (No Tools)	+7.90
ARC-AGI	955 / 68极高强度思考（无工具）	93.709 / 68Normal (No Tools)	+1.30
GPQA Diamond	93.606 / 187Thinking High (No Tools)	92.8011 / 187极高强度思考（无工具）	+0.80
LiveBench	80.711 / 115Deep Thinking (No Tools)	80.282 / 115Deep Thinking (No Tools)	+0.43
HLE	52.2020 / 172Thinking High (With Tools)	52.1021 / 172极高强度思考（工具）	+0.10
ARC-AGI-3	05 / 9Thinking High (No Tools)	07 / 9Thinking High (No Tools)	—

GPT-5.5 3/3

Benchmark	GPT-5.5	GPT-5.4	Diff
Terminal Bench 2.0	82.701 / 47Thinking High (With Tools)	75.104 / 47极高强度思考（工具）	+7.60
MCP-Atlas	75.3012 / 27极高强度思考（工具）	70.6014 / 27极高强度思考（工具）	+4.70
OSWorld-Verified	78.708 / 24Thinking High (With Tools)	7512 / 24极高强度思考（工具）	+3.70

GPT-5.5 2/2

Benchmark	GPT-5.5	GPT-5.4	Diff
DeepSWE	677 / 19极高强度思考（工具）	5212 / 19极高强度思考（工具）	+15
SWE-Bench Pro - Public	58.6013 / 54Thinking High (With Tools)	57.7017 / 54极高强度思考（无工具）	+0.90

GPT-5.5 2/2

Benchmark	GPT-5.5	GPT-5.4	Diff
FrontierMath - Tier 4	35.407 / 80Thinking High (With Tools)	27.1011 / 80极高强度思考（无工具）	+8.30
FrontierMath	51.702 / 60Thinking High (With Tools)	47.605 / 60极高强度思考（无工具）	+4.10

GPT-5.5 1/1

Benchmark	GPT-5.5	GPT-5.4	Diff
τ²-Bench - Telecom	985 / 35Thinking High (With Tools)	64.3030 / 35Normal (With Tools)	+33.70

GPT-5.5 1/1

Benchmark	GPT-5.5	GPT-5.4	Diff
BrowseComp	84.408 / 53Thinking High (With Tools + Internet)	82.7015 / 53极高强度思考（工具）	+1.70

Prices use DataLearner records when available; missing fields are not inferred.

GPT-5.5leads in:General Knowledge (5/6), AI Agent - Tool Usage (3/3), Coding and Software Engineer (2/2), Math and Reasoning (2/2), Agent Level Benchmark (1/1), AI Agent - Information Search (1/1)

On average across the 15 shared benchmarks, GPT-5.5 scores 6.02 higher.

Largest single-benchmark gap: τ²-Bench - Telecom — GPT-5.5 98 vs GPT-5.4 64.30 (+33.70).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.