Benchmark Results
Benchmark Results
综合评估
5 evaluations数学推理
5 evaluationsCompetitor Comparison
Benchmark scores for GPT-5.4 Pro compared against top models in its class
8 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.
| Benchmark | GPT-5.4 ProCurrent | Claude Opus 4.6 | Gemini 3.1 Pro Preview |
|---|---|---|---|
ARC-AGI 综合评估 | 94.50Thinking Level · High | 92.00Extended Thinking | -- |
ARC-AGI-2 综合评估 | 83.30Thinking Level · High | 66.30Extended Thinking | 77.10Thinking Level · High |
GPQA Diamond 综合评估 | 94.40Thinking Level · High | 91.31Extended Thinking | 94.30Thinking Level · High |
HLE 综合评估 | 58.70Thinking Level · High | Tools | 53.00Extended Thinking | Tools | 51.40Thinking Level · High | Tools |
FrontierMath 数学推理 | 50.00Thinking Level · Extra High | 40.70Thinking Level · High | 36.90Thinking Level · High |
38.00Thinking Level · High | 22.90Thinking Level · High | 16.70Standard Mode | |
BrowseComp AI Agent - 信息收集 | 89.30Thinking Level · High | Tools | 84.00Thinking Enabled | Tools | 85.90Thinking Level · High | Tools |
GDPval-AA 生产力知识 | 82.00Thinking Level · High | Tools | 1606.00Extended Thinking | Tools | -- |
Standard API Pricing: GPT-5.4 Pro vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
When a context threshold exists, the charted base price only applies within these limits:
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.4 Pro | OpenAI | $30 / 1M tokens | $180 / 1M tokens | <= 272K |
Claude Opus 4.6 | Anthropic | $5 / 1M tokens | $25 / 1M tokens | <= 200K |
Gemini 3.1 Pro Preview | Google Deep Mind | $2 / 1M tokens | $12 / 1M tokens | <= 200K |
Version History
How each version of the GPT-5.4 Pro series stacks up on benchmark tests
6 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.
| Benchmark | GPT-5.4 ProCurrent | GPT-5.2 Pro | GPT-5-Pro |
|---|---|---|---|
ARC-AGI 综合评估 | 94.50Thinking Level · High | 90.50Thinking Enabled | 70.20Thinking Enabled |
ARC-AGI-2 综合评估 | 83.30Thinking Level · High | 54.20Thinking Enabled | 18.00Thinking Enabled |
GPQA Diamond 综合评估 | 94.40Thinking Level · High | 93.20Thinking Enabled | 89.40Thinking Enabled | Tools |
HLE 综合评估 | 58.70Thinking Level · High | Tools | 50.00Thinking Enabled | Tools | 42.00Thinking Enabled | Tools |
38.00Thinking Level · High | 31.30Thinking Enabled | 14.60Thinking Enabled | |
BrowseComp AI Agent - 信息收集 | 89.30Thinking Level · High | Tools | 77.90Thinking Enabled | Tools | -- |
Single-Benchmark Version Trend
Viewing: ARC-AGI · 综合评估
Standard API Pricing Across the GPT-5.4 Pro Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
When a context threshold exists, the charted base price only applies within these limits:
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.4 Pro | OpenAI | $30 / 1M tokens | $180 / 1M tokens | <= 272K |
GPT-5.2 Pro | — | $21.00 / 1M tokens | $168.00 / 1M tokens | — |
GPT-5-Pro | — | 15 美元/100 万tokens | 120 美元/100 万tokens | — |