GPT-5.4 mini Benchmark Details

GPT-5.4 mini currently shows benchmark results led by GPQA Diamond (33 / 179, score 88), Tool Decathlon (2 / 7, score 42.90), HLE (47 / 159, score 41.50). This page also compares it with 2 competitor models and 1 predecessor or same-series models, including performance and pricing views when available.

Benchmark Results

GPT-5.4 mini

Benchmark Results

General Knowledge

8 evaluations

Benchmark / mode

Score

Rank/total

GPQA Diamond

Extra-High

33 / 179

LiveBench

Standard Mode

36.95

112 / 115

LiveBench

Low

49.54

93 / 115

LiveBench

Medium

58.33

76 / 115

LiveBench

High

63.57

55 / 115

LiveBench

Deep Thinking Mode

67.54

48 / 115

HLE

Extra-High

28.20

83 / 159

HLE

Extra-HighTools

41.50

47 / 159

Math and Reasoning

1 evaluations

Benchmark / mode

Score

Rank/total

FrontierMath - Tier 4

High

2.10

56 / 80

Coding and Software Engineer

1 evaluations

Benchmark / mode

Score

Rank/total

SWE-Bench Pro - Public

Extra-HighTools

54.40

22 / 44

Agent Level Benchmark

1 evaluations

Benchmark / mode

Score

Rank/total

τ²-Bench - Telecom

Extra-HighTools

93.40

17 / 35

AI Agent - Tool Usage

4 evaluations

Benchmark / mode

Score

Rank/total

OSWorld-Verified

Extra-HighTools

72.10

11 / 18

Terminal Bench 2.0

Extra-HighTools

19 / 46

MCP-Atlas

Extra-HighTools

56.70

20 / 23

Tool Decathlon

Extra-HighTools

42.90

2 / 7

Claw-style Agent Evaluation

1 evaluations

Benchmark / mode

Score

Rank/total

Claw Bench

Thinking EnabledTools

75.30

25 / 29

Compare with other models

Competitor Comparison

Benchmark scores for GPT-5.4 mini compared against top models in its class

GPT-5.4 miniHaiku 4.5 Gemini 3.0 Flash

Benchmark categories:

The chart shows each model’s highest score per benchmark within the current filter. Out-of-100 benchmarks use raw heights; out-of-range benchmarks are scaled within that benchmark while labels keep the original scores.

8 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	GPT-5.4 miniCurrent	Haiku 4.5	Gemini 3.0 Flash
GPQA Diamond 综合评估	88.00Thinking Level · Extra High	73.30Extended Thinking	90.40Thinking Enabled
HLE 综合评估	41.50Thinking Level · Extra High ｜ Tools	9.70Extended Thinking	43.50Thinking Enabled ｜ Tools
LiveBench 综合评估	67.54Deep Thinking Mode	61.3264K	72.40Thinking Level · High
FrontierMath - Tier 4 数学推理	2.10Thinking Level · High	2.1032K	4.20Standard Mode
SWE-Bench Pro - Public 编程与软件工程	54.40Thinking Level · Extra High ｜ Tools	39.45Extended Thinking ｜ Tools	49.60Thinking Level · High ｜ Tools
MCP-Atlas AI Agent - 工具使用	56.70Thinking Level · Extra High ｜ Tools	--	62.00Standard Mode ｜ Tools
Terminal Bench 2.0 AI Agent - 工具使用	60.00Thinking Level · Extra High ｜ Tools	--	47.60Thinking Enabled ｜ Tools
Claw Bench OpenClaw智能体能力综合测评	75.30Thinking Enabled ｜ Tools	89.40Thinking Enabled ｜ Tools	85.70Thinking Enabled ｜ Tools

Standard API Pricing: GPT-5.4 mini vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
GPT-5.4 mini	OpenAI	$0.75 / 1M tokens	$4.5 / 1M tokens	—

Version History

How each version of the GPT-5.4 mini series stacks up on benchmark tests

GPT-5.4 miniGPT-5-mini

Benchmark categories:

4 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.· Click a row to view its trend chart.

Benchmark	GPT-5.4 miniCurrent	GPT-5-mini
GPQA Diamond 综合评估	88.00Thinking Level · Extra High	69.00Thinking Enabled
HLE 综合评估	41.50Thinking Level · Extra High ｜ Tools	5.00Thinking Enabled
LiveBench 综合评估	67.54Deep Thinking Mode	61.01Standard Mode
FrontierMath - Tier 4 数学推理	2.10Thinking Level · High	6.30Thinking Level · High

Single-Benchmark Version Trend

Viewing: GPQA Diamond · 综合评估

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; solid lines connect the same mode across versions, while dotted guides align modes within the same generation.

Standard API Pricing Across the GPT-5.4 mini Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

Model	Supplier	Standard input	Standard output	Base price applies to
GPT-5.4 mini	OpenAI	$0.75 / 1M tokens	$4.5 / 1M tokens	—