GPT-5.4 Benchmark Details
GPT-5.4 currently shows benchmark results led by LiveBench (2 / 115, score 80.28), Pinch Bench (1 / 37, score 90.50), GPQA Diamond (10 / 179, score 92.80). This page also compares it with 2 competitor models and 2 predecessor or same-series models, including performance and pricing views when available. 2 source links are attached for reference.
Benchmark Results
Benchmark Results
General Knowledge
14 evaluationsMath and Reasoning
2 evaluationsCoding and Software Engineer
2 evaluationsAgent Level Benchmark
2 evaluationsAI Agent - Tool Usage
3 evaluationsClaw-style Agent Evaluation
2 evaluationsCompetitor Comparison
Benchmark scores for GPT-5.4 compared against top models in its class
10 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.
| Benchmark | GPT-5.4Current | Gemini 3.1 Pro Preview | Claude Opus 4.6 |
|---|---|---|---|
ARC-AGI 综合评估 | 93.70Standard Mode | -- | 92.00Extended Thinking |
ARC-AGI-2 综合评估 | 77.10Standard Mode | 77.10Thinking Level · High | 66.30Extended Thinking |
HLE 综合评估 | 52.10Thinking Level · Extra High | Tools | 51.40Thinking Level · High | Tools | 53.00Extended Thinking | Tools |
27.10Thinking Level · Extra High | 16.70Thinking Level · High | 22.90Thinking Level · High | |
τ²-Bench - Telecom Agent能力评测 | 98.90Thinking Level · Extra High | Tools | 99.30Thinking Level · High | Tools | 99.25Extended Thinking | Tools |
BrowseComp AI Agent - 信息收集 | 82.70Thinking Level · Extra High | Tools | 85.90Thinking Level · High | Tools | 84.00Thinking Enabled | Tools |
MCP-Atlas AI Agent - 工具使用 | 70.60Thinking Level · Extra High | Tools | -- | 76.80Deep Thinking Mode | Tools |
OSWorld-Verified AI Agent - 工具使用 | 75.00Thinking Level · Extra High | Tools | -- | 72.70Extended Thinking | Tools |
Terminal Bench 2.0 AI Agent - 工具使用 | 75.10Thinking Level · Extra High | Tools | 68.50Thinking Level · High | Tools | 65.40Extended Thinking | Tools |
Pinch Bench OpenClaw智能体能力综合测评 | 90.50Thinking Enabled | Tools | 86.70Thinking Enabled | Tools | 87.40Thinking Enabled | Tools |
Standard API Pricing: GPT-5.4 vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
When a context threshold exists, the charted base price only applies within these limits:
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.4 | OpenAI | $2.5 / 1M tokens | $15 / 1M tokens | <= 272K |
Gemini 3.1 Pro Preview | Google Deep Mind | $2 / 1M tokens | $12 / 1M tokens | <= 200K |
Claude Opus 4.6 | Anthropic | $5 / 1M tokens | $25 / 1M tokens | <= 200K |
Version History
How each version of the GPT-5.4 series stacks up on benchmark tests
8 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.
| Benchmark | GPT-5.4Current | GPT-5.2 | GPT-5.1 |
|---|---|---|---|
ARC-AGI 综合评估 | 93.70Standard Mode | 90.50Deep Thinking Mode | 72.80Thinking Level · High |
ARC-AGI-2 综合评估 | 77.10Standard Mode | 54.20Deep Thinking Mode | 17.60Thinking Level · High |
HLE 综合评估 | 52.10Thinking Level · Extra High | Tools | 45.50Deep Thinking Mode | Tools | 42.70Thinking Level · High | Tools |
LiveBench 综合评估 | 80.28Deep Thinking Mode | 48.91Standard Mode | 72.04Thinking Level · High |
27.10Thinking Level · Extra High | 18.80Thinking Level · Extra High | 12.50Thinking Level · High | Tools | |
τ²-Bench - Telecom Agent能力评测 | 98.90Thinking Level · Extra High | Tools | 98.70Thinking Level · Extra High | Tools | 95.60Thinking Level · High | Tools |
BrowseComp AI Agent - 信息收集 | 82.70Thinking Level · Extra High | Tools | 65.80Thinking Level · Extra High | Tools | 50.80Thinking Level · High |
Terminal Bench 2.0 AI Agent - 工具使用 | 75.10Thinking Level · Extra High | Tools | -- | 47.60Thinking Level · High | Tools |
Single-Benchmark Version Trend
Viewing: ARC-AGI · 综合评估
Standard API Pricing Across the GPT-5.4 Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
When a context threshold exists, the charted base price only applies within these limits:
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.4 | OpenAI | $2.5 / 1M tokens | $15 / 1M tokens | <= 272K |
GPT-5.2 | Facebook AI研究实验室 | $1.75 / 1M tokens | $14 / 1M tokens | — |