GPT-5.4 mini Benchmark Details
GPT-5.4 mini currently shows benchmark results led by GPQA Diamond (33 / 179, score 88), Tool Decathlon (2 / 7, score 42.90), HLE (47 / 159, score 41.50). This page also compares it with 2 competitor models and 1 predecessor or same-series models, including performance and pricing views when available.
Benchmark Results
Benchmark Results
General Knowledge
8 evaluationsCoding and Software Engineer
1 evaluationsAI Agent - Tool Usage
4 evaluationsCompetitor Comparison
Benchmark scores for GPT-5.4 mini compared against top models in its class
8 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.
| Benchmark | GPT-5.4 miniCurrent | Haiku 4.5 | Gemini 3.0 Flash |
|---|---|---|---|
GPQA Diamond 综合评估 | 88.00Thinking Level · Extra High | 73.30Extended Thinking | 90.40Thinking Enabled |
HLE 综合评估 | 41.50Thinking Level · Extra High | Tools | 9.70Extended Thinking | 43.50Thinking Enabled | Tools |
LiveBench 综合评估 | 67.54Deep Thinking Mode | 61.3264K | 72.40Thinking Level · High |
2.10Thinking Level · High | 2.1032K | 4.20Standard Mode | |
SWE-Bench Pro - Public 编程与软件工程 | 54.40Thinking Level · Extra High | Tools | 39.45Extended Thinking | Tools | 49.60Thinking Level · High | Tools |
MCP-Atlas AI Agent - 工具使用 | 56.70Thinking Level · Extra High | Tools | -- | 62.00Standard Mode | Tools |
Terminal Bench 2.0 AI Agent - 工具使用 | 60.00Thinking Level · Extra High | Tools | -- | 47.60Thinking Enabled | Tools |
Claw Bench OpenClaw智能体能力综合测评 | 75.30Thinking Enabled | Tools | 89.40Thinking Enabled | Tools | 85.70Thinking Enabled | Tools |
Standard API Pricing: GPT-5.4 mini vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.4 mini | OpenAI | $0.75 / 1M tokens | $4.5 / 1M tokens | — |
Version History
How each version of the GPT-5.4 mini series stacks up on benchmark tests
4 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.
| Benchmark | GPT-5.4 miniCurrent | GPT-5-mini |
|---|---|---|
GPQA Diamond 综合评估 | 88.00Thinking Level · Extra High | 69.00Thinking Enabled |
HLE 综合评估 | 41.50Thinking Level · Extra High | Tools | 5.00Thinking Enabled |
LiveBench 综合评估 | 67.54Deep Thinking Mode | 61.01Standard Mode |
2.10Thinking Level · High | 6.30Thinking Level · High |
Single-Benchmark Version Trend
Viewing: GPQA Diamond · 综合评估
Standard API Pricing Across the GPT-5.4 mini Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.4 mini | OpenAI | $0.75 / 1M tokens | $4.5 / 1M tokens | — |