Benchmark Results
Benchmark Results
综合评估
12 evaluations数学推理
2 evaluationsAgent能力评测
2 evaluationsAI Agent - 工具使用
2 evaluationsOpenClaw智能体能力综合测评
2 evaluationsCompetitor Comparison
Benchmark scores for GPT-5.4 compared against top models in its class
9 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.
| Benchmark | GPT-5.4Current | Gemini 3.1 Pro Preview | Claude Opus 4.6 |
|---|---|---|---|
ARC-AGI 综合评估 | 93.70Standard Mode | -- | 92.00Extended Thinking |
ARC-AGI-2 综合评估 | 77.10Standard Mode | 77.10Thinking Level · High | 66.30Extended Thinking |
HLE 综合评估 | 52.10Thinking Level · Extra High | Tools | 51.40Thinking Level · High | Tools | 53.00Extended Thinking | Tools |
27.10Thinking Level · Extra High | 16.70Standard Mode | 22.90Thinking Level · High | |
τ²-Bench - Telecom Agent能力评测 | 98.90Thinking Level · Extra High | Tools | 99.30Thinking Level · High | Tools | 99.25Extended Thinking | Tools |
BrowseComp AI Agent - 信息收集 | 82.70Thinking Level · Extra High | Tools | 85.90Thinking Level · High | Tools | 84.00Thinking Enabled | Tools |
OSWorld-Verified AI Agent - 工具使用 | 75.00Thinking Level · Extra High | Tools | -- | 72.70Extended Thinking | Tools |
Terminal Bench 2.0 AI Agent - 工具使用 | 75.10Thinking Level · Extra High | Tools | 68.50Thinking Level · High | Tools | 65.40Extended Thinking | Tools |
Pinch Bench OpenClaw智能体能力综合测评 | 90.50Thinking Enabled | Tools | 86.70Thinking Enabled | Tools | 87.40Thinking Enabled | Tools |
Standard API Pricing: GPT-5.4 vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
When a context threshold exists, the charted base price only applies within these limits:
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.4 | OpenAI | $2.5 / 1M tokens | $15 / 1M tokens | <= 272K |
Gemini 3.1 Pro Preview | Google Deep Mind | $2 / 1M tokens | $12 / 1M tokens | <= 200K |
Claude Opus 4.6 | Anthropic | $5 / 1M tokens | $25 / 1M tokens | <= 200K |
Version History
How each version of the GPT-5.4 series stacks up on benchmark tests
7 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.
| Benchmark | GPT-5.4Current | GPT-5.2 | GPT-5.1 |
|---|---|---|---|
ARC-AGI 综合评估 | 93.70Standard Mode | 90.50Deep Thinking Mode | 72.80Thinking Level · High |
ARC-AGI-2 综合评估 | 77.10Standard Mode | 54.20Deep Thinking Mode | 17.60Thinking Level · High |
HLE 综合评估 | 52.10Thinking Level · Extra High | Tools | 45.50Deep Thinking Mode | Tools | 42.70Thinking Level · High | Tools |
27.10Thinking Level · Extra High | 18.80Thinking Level · Extra High | 12.50Thinking Level · High | Tools | |
τ²-Bench - Telecom Agent能力评测 | 98.90Thinking Level · Extra High | Tools | 98.70Thinking Level · Extra High | Tools | 95.60Thinking Level · High | Tools |
BrowseComp AI Agent - 信息收集 | 82.70Thinking Level · Extra High | Tools | 65.80Thinking Level · Extra High | Tools | 50.80Thinking Level · High |
Terminal Bench 2.0 AI Agent - 工具使用 | 75.10Thinking Level · Extra High | Tools | -- | 47.60Thinking Level · High | Tools |
Single-Benchmark Version Trend
Viewing: ARC-AGI · 综合评估
Standard API Pricing Across the GPT-5.4 Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
When a context threshold exists, the charted base price only applies within these limits:
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.4 | OpenAI | $2.5 / 1M tokens | $15 / 1M tokens | <= 272K |
GPT-5.2 | Facebook AI研究实验室 | $1.75 / 1M tokens | $14 / 1M tokens | — |
GPT-5.1 | — | 1.25 美元/100万 tokens | 10 美元/100万 tokens | — |