GPT-5.4 Pro Benchmark Details
GPT-5.4 Pro currently shows benchmark results led by GPQA Diamond (2 / 165, score 94.40), HLE (2 / 125, score 58.70), FrontierMath (1 / 54, score 50). This page also compares it with 2 competitor models and 3 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.
Benchmark Results
Benchmark Results
综合评估
5 evaluationsCompetitor Comparison
Benchmark scores for GPT-5.4 Pro compared against top models in its class
Benchmark Score Comparison
6 benchmarks with comparable scores
| Benchmark | GPT-5.4 Pro(This model) | Claude Opus 4.6 | Gemini 3.1 Pro Preview |
|---|---|---|---|
ARC-AGI 综合评估 | 94.50 思考模式 High(无工具) | 92.00 扩展(无工具) | -- |
ARC-AGI-2 综合评估 | 83.30 常规模式(无工具) | 66.30 扩展(无工具) | 77.10 思考模式 High(无工具) |
GPQA Diamond 综合评估 | 94.40 思考模式 High(无工具) | 91.31 扩展(无工具) | 94.30 思考模式 High(无工具) |
HLE 综合评估 | 58.70 思考模式 High(工具) | 53.00 扩展(工具,联网) | 51.40 思考模式 High(工具) |
BrowseComp AI Agent - 信息收集 | 89.30 思考模式 High(工具) | 84.00 思考模式(工具+联网) | 85.90 思考模式 High(工具+联网) |
GDPval-AA 生产力知识 | 82.00 思考模式 High(工具) | 1606.00 扩展(工具,联网) | 1317.00 思考模式 High(工具) |
Standard API Pricing: GPT-5.4 Pro vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
When a context threshold exists, the charted base price only applies within these limits:
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.4 Pro Current model | OpenAI | $30 / 1M tokens | $180 / 1M tokens | <= 272K |
Claude Opus 4.6 | Anthropic | $5 / 1M tokens | $25 / 1M tokens | <= 200K |
Gemini 3.1 Pro Preview | Google Deep Mind | $2 / 1M tokens | $12 / 1M tokens | <= 200K |
Version History
How each version of the GPT-5.4 Pro series stacks up on benchmark tests
Benchmark Score Comparison
6 benchmarks with comparable scores
| Benchmark | GPT-5.4 Pro(This model) | GPT-5.2 Pro | GPT-5-Pro |
|---|---|---|---|
ARC-AGI 综合评估 | 94.50 思考模式 High(无工具) | 90.50 thinking | 70.20 thinking |
ARC-AGI-2 综合评估 | 83.30 常规模式(无工具) | 54.20 thinking | 18.00 thinking |
GPQA Diamond 综合评估 | 94.40 思考模式 High(无工具) | 93.20 thinking | 89.40 thinking + 使用工具 |
HLE 综合评估 | 58.70 思考模式 High(工具) | 50.00 thinking + 使用工具 | 42.00 thinking + 使用工具 |
38.00 常规模式(无工具) | 31.30 thinking | 14.60 thinking | |
BrowseComp AI Agent - 信息收集 | 89.30 思考模式 High(工具) | 77.90 thinking + 使用工具 | -- |
Standard API Pricing Across the GPT-5.4 Pro Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
When a context threshold exists, the charted base price only applies within these limits:
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.4 Pro Current model | OpenAI | $30 / 1M tokens | $180 / 1M tokens | <= 272K |
GPT-5.2 Pro | — | $21.00 / 1M tokens | $168.00 / 1M tokens | — |
GPT-5-Pro | — | 15 美元/100 万tokens | 120 美元/100 万tokens | — |
Series Overview
See how each version of the GPT-5.4 Pro series performs across major benchmarks. Click any row to break down scores by reasoning mode.
Tip: click any score cell to switch the chart below.
| Benchmark | GPT-5-Pro8/7/2025 | GPT-5.2 Pro12/11/2025 | GPT-5.4 Pro3/5/2026 |
|---|---|---|---|
Single-Benchmark Mode Relation
Viewing: ARC-AGI · 综合评估