GPT-5.4 nano Benchmark Details

GPT-5.4 nano currently shows benchmark results led by LiveBench (38 / 115, score 70.13), Claw Bench (10 / 29, score 89.70), GPQA Diamond (63 / 179, score 82.80).

Benchmark Results

GPT-5.4 nano

Benchmark Results

Thinking
Tool usage

General Knowledge

8 evaluations
Benchmark / mode
Score
Rank/total
GPQA Diamond
Extra-High
82.80
63 / 179
LiveBench
Standard Mode
32.39
115 / 115
48.67
96 / 115
LiveBench
Medium
58.46
75 / 115
62.75
57 / 115
LiveBench
Deep Thinking Mode
70.13
38 / 115
HLE
Extra-High
24.30
92 / 159
HLE
Extra-HighTools
37.70
57 / 159

Multimodal Understanding

2 evaluations
Benchmark / mode
Score
Rank/total
MMMU
Extra-High
66.10
26 / 28
MMMU
Extra-HighTools
69.50
24 / 28

Math and Reasoning

1 evaluations
Benchmark / mode
Score
Rank/total
6.30
35 / 80

Coding and Software Engineer

1 evaluations
Benchmark / mode
Score
Rank/total
SWE-Bench Pro - Public
Extra-HighTools
52.40
27 / 44

Agent Level Benchmark

1 evaluations
Benchmark / mode
Score
Rank/total
τ²-Bench - Telecom
Extra-HighTools
92.50
19 / 35

AI Agent - Tool Usage

3 evaluations
Benchmark / mode
Score
Rank/total
Terminal Bench 2.0
Extra-HighTools
46.30
40 / 46
OSWorld-Verified
Extra-HighTools
39
17 / 18
Tool Decathlon
Extra-HighTools
35.50
6 / 7

Claw-style Agent Evaluation

1 evaluations
Benchmark / mode
Score
Rank/total
Claw Bench
Thinking EnabledTools
89.70
10 / 29