GPT-5.4 mini Benchmark Details

GPT-5.4 mini currently shows benchmark results led by GPQA Diamond (33 / 179, score 88), Tool Decathlon (2 / 7, score 42.90), HLE (47 / 159, score 41.50). This page also compares it with 2 competitor models and 1 predecessor or same-series models, including performance and pricing views when available.

Benchmark Results

GPT-5.4 mini

Benchmark Results

Thinking
Tool usage

General Knowledge

8 evaluations
Benchmark / mode
Score
Rank/total
GPQA Diamond
Extra-High
88
33 / 179
LiveBench
Standard Mode
36.95
112 / 115
49.54
93 / 115
LiveBench
Medium
58.33
76 / 115
63.57
55 / 115
LiveBench
Deep Thinking Mode
67.54
48 / 115
HLE
Extra-High
28.20
83 / 159
HLE
Extra-HighTools
41.50
47 / 159

Math and Reasoning

1 evaluations
Benchmark / mode
Score
Rank/total
2.10
56 / 80

Coding and Software Engineer

1 evaluations
Benchmark / mode
Score
Rank/total
SWE-Bench Pro - Public
Extra-HighTools
54.40
22 / 44

Agent Level Benchmark

1 evaluations
Benchmark / mode
Score
Rank/total
τ²-Bench - Telecom
Extra-HighTools
93.40
17 / 35

AI Agent - Tool Usage

4 evaluations
Benchmark / mode
Score
Rank/total
OSWorld-Verified
Extra-HighTools
72.10
11 / 18
Terminal Bench 2.0
Extra-HighTools
60
19 / 46
MCP-Atlas
Extra-HighTools
56.70
20 / 23
Tool Decathlon
Extra-HighTools
42.90
2 / 7

Claw-style Agent Evaluation

1 evaluations
Benchmark / mode
Score
Rank/total
Claw Bench
Thinking EnabledTools
75.30
25 / 29

Competitor Comparison

Benchmark scores for GPT-5.4 mini compared against top models in its class

Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

8 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

BenchmarkGPT-5.4 miniCurrentHaiku 4.5Gemini 3.0 Flash
GPQA Diamond
综合评估
88.00Thinking Level · Extra High
73.30Extended Thinking
90.40Thinking Enabled
HLE
综合评估
41.50Thinking Level · Extra High | Tools
9.70Extended Thinking
43.50Thinking Enabled | Tools
LiveBench
综合评估
67.54Deep Thinking Mode
61.3264K
72.40Thinking Level · High
2.10Thinking Level · High
2.1032K
4.20Standard Mode
SWE-Bench Pro - Public
编程与软件工程
54.40Thinking Level · Extra High | Tools
39.45Extended Thinking | Tools
49.60Thinking Level · High | Tools
MCP-Atlas
AI Agent - 工具使用
56.70Thinking Level · Extra High | Tools
--
62.00Standard Mode | Tools
Terminal Bench 2.0
AI Agent - 工具使用
60.00Thinking Level · Extra High | Tools
--
47.60Thinking Enabled | Tools
Claw Bench
OpenClaw智能体能力综合测评
75.30Thinking Enabled | Tools
89.40Thinking Enabled | Tools
85.70Thinking Enabled | Tools

Standard API Pricing: GPT-5.4 mini vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

ModelSupplierStandard inputStandard outputBase price applies to
GPT-5.4 mini
OpenAI$0.75 / 1M tokens$4.5 / 1M tokens

Version History

How each version of the GPT-5.4 mini series stacks up on benchmark tests

GPT-5.4 miniGPT-5-mini
Benchmark categories:
The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

4 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

BenchmarkGPT-5.4 miniCurrentGPT-5-mini
GPQA Diamond
综合评估
88.00Thinking Level · Extra High
69.00Thinking Enabled
HLE
综合评估
41.50Thinking Level · Extra High | Tools
5.00Thinking Enabled
LiveBench
综合评估
67.54Deep Thinking Mode
61.01Standard Mode
2.10Thinking Level · High
6.30Thinking Level · High

Single-Benchmark Version Trend

Viewing: GPQA Diamond · 综合评估

Benchmark
NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the GPT-5.4 mini Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

ModelSupplierStandard inputStandard outputBase price applies to
GPT-5.4 mini
OpenAI$0.75 / 1M tokens$4.5 / 1M tokens