GPT-5.4vsGemini 3.1 Pro Preview

Across 15 shared benchmarks, GPT-5.4 leads overall: GPT-5.4 wins 8, Gemini 3.1 Pro Preview wins 5, with 2 ties and an average score difference of +1.84.

GPT-5.4

OpenAI · 2026-03-05 · Multimodal model

Gemini 3.1 Pro Preview

Google Deep Mind · 2026-02-20 · Multimodal model

GPT-5.48 wins(53%)Ties2(33%)5 winsGemini 3.1 Pro Preview

Benchmark scores

Grouped by capability, sorted by largest gap within each. 15 shared benchmarks.

General Knowledge

GPT-5.4 2/5

Benchmark	GPT-5.4	Gemini 3.1 Pro Preview	Diff
GPQA Diamond	92.8011 / 187极高强度思考（无工具）	94.303 / 187Thinking High (No Tools)	-1.50
HLE	52.1021 / 172极高强度思考（工具）	51.4022 / 172Thinking High (With Tools)	+0.70
LiveBench	80.282 / 115Deep Thinking (No Tools)	79.933 / 115Thinking High (No Tools)	+0.35
ARC-AGI-3	07 / 9Thinking High (No Tools)	06 / 9Thinking High (No Tools)	—
ARC-AGI-2	77.109 / 62Normal (No Tools)	77.109 / 62Thinking High (No Tools)	—

AI Agent - Tool Usage

Gemini 3.1 Pro Preview 2/3

Benchmark	GPT-5.4	Gemini 3.1 Pro Preview	Diff
MCP-Atlas	70.6014 / 27极高强度思考（工具）	78.209 / 27Thinking High (With Tools)	-7.60
Terminal Bench 2.0	75.104 / 47极高强度思考（工具）	68.508 / 47Thinking High (With Tools)	+6.60
OSWorld-Verified	7512 / 24极高强度思考（工具）	76.2011 / 24Thinking (With Tools)	-1.20

Coding and Software Engineer

GPT-5.4 2/2

Benchmark	GPT-5.4	Gemini 3.1 Pro Preview	Diff
DeepSWE	5212 / 19极高强度思考（工具）	1219 / 19Thinking High (With Tools)	+40
SWE-Bench Pro - Public	57.7017 / 54极高强度思考（无工具）	54.2032 / 54Thinking High (With Tools)	+3.50

Math and Reasoning

GPT-5.4 2/2

Benchmark	GPT-5.4	Gemini 3.1 Pro Preview	Diff
FrontierMath	47.605 / 60极高强度思考（无工具）	36.9011 / 60Thinking High (No Tools)	+10.70
FrontierMath - Tier 4	27.1011 / 80极高强度思考（无工具）	16.7020 / 80Normal (No Tools)	+10.40

Agent Level Benchmark

Gemini 3.1 Pro Preview 1/1

Benchmark	GPT-5.4	Gemini 3.1 Pro Preview	Diff
τ²-Bench - Telecom	64.3030 / 35Normal (With Tools)	99.301 / 35Thinking High (With Tools)	-35

AI Agent - Information Search

Gemini 3.1 Pro Preview 1/1

Benchmark	GPT-5.4	Gemini 3.1 Pro Preview	Diff
BrowseComp	82.7015 / 53极高强度思考（工具）	85.905 / 53Thinking High (With Tools + Internet)	-3.20

Claw-style Agent Evaluation

GPT-5.4 1/1

Benchmark	GPT-5.4	Gemini 3.1 Pro Preview	Diff
Pinch Bench	90.501 / 37Thinking (With Tools)	86.7010 / 37Thinking (With Tools)	+3.80

Specs

Field	GPT-5.4	Gemini 3.1 Pro Preview
Publisher	OpenAI	Google Deep Mind
Release date	2026-03-05	2026-02-20
Model type	Multimodal model	Multimodal model
Architecture	Dense	Dense
Parameters	Not available	Not available
Context length	1M	1M
Max output	125K	64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

Item	GPT-5.4	Gemini 3.1 Pro Preview
Text input	$2.5 / 1M tokens	$2 / 1M tokens
Text output	$15 / 1M tokens	$12 / 1M tokens
Cache write	$0.25 / 1M tokens	Not public

Summary

GPT-5.4leads in:General Knowledge (2/5), Coding and Software Engineer (2/2), Math and Reasoning (2/2), Claw-style Agent Evaluation (1/1)
Gemini 3.1 Pro Previewleads in:AI Agent - Tool Usage (2/3), Agent Level Benchmark (1/1), AI Agent - Information Search (1/1)

On average across the 15 shared benchmarks, GPT-5.4 scores 1.84 higher.

Largest single-benchmark gap: DeepSWE — GPT-5.4 52 vs Gemini 3.1 Pro Preview 12 (+40).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.

GPT-5.4 details Gemini 3.1 Pro Preview details·Customize in compare tool