GPT-5.4 minivsGemini 3.0 Flash

Across 6 shared benchmarks, Gemini 3.0 Flash leads overall: GPT-5.4 mini wins 2, Gemini 3.0 Flash wins 4, with 0 ties and an average score difference of +0.05.

OpenAI
GPT-5.4 mini

OpenAI · 2026-03-17 · Reasoning model

Google Deep Mind
Gemini 3.0 Flash

Google Deep Mind · 2025-12-17 · Chat model

GPT-5.4 mini2 wins(33%)(67%)4 winsGemini 3.0 Flash

Benchmark scores

Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.

General Knowledge

Gemini 3.0 Flash 2/2
BenchmarkGPT-5.4 miniGemini 3.0 FlashDiff
GPQA Diamond8832 / 178极高强度思考(无工具)90.4017 / 178-2.40
HLE41.5046 / 157极高强度思考(工具)43.5038 / 157-2

AI Agent - Tool Usage

GPT-5.4 mini 1/1
BenchmarkGPT-5.4 miniGemini 3.0 FlashDiff
Terminal Bench 2.06019 / 46极高强度思考(工具)47.6037 / 46+12.40

Claw-style Agent Evaluation

Gemini 3.0 Flash 1/1
BenchmarkGPT-5.4 miniGemini 3.0 FlashDiff
Claw Bench75.3025 / 29Thinking (With Tools)85.7015 / 29Thinking (With Tools)-10.40

Coding and Software Engineer

GPT-5.4 mini 1/1
BenchmarkGPT-5.4 miniGemini 3.0 FlashDiff
SWE-Bench Pro - Public54.4021 / 43极高强度思考(工具)49.6032 / 43Thinking High (With Tools)+4.80

Math and Reasoning

Gemini 3.0 Flash 1/1
BenchmarkGPT-5.4 miniGemini 3.0 FlashDiff
FrontierMath - Tier 42.1056 / 80Thinking High (No Tools)4.2040 / 80Normal (No Tools)-2.10

Specs

FieldGPT-5.4 miniGemini 3.0 Flash
PublisherOpenAIGoogle Deep Mind
Release date2026-03-172025-12-17
Model typeReasoning modelChat model
ArchitectureDenseDense
ParametersNot availableNot available
Context length400K2000K
Max output128K64K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemGPT-5.4 miniGemini 3.0 Flash
Text input$0.75 / 1M tokensNot public
Text output$4.5 / 1M tokensNot public
Cache read$4.5 / 1M tokensNot public
Cache write$0.075 / 1M tokensNot public

One or both models have incomplete public pricing.

Summary

  • GPT-5.4 minileads in:AI Agent - Tool Usage (1/1), Coding and Software Engineer (1/1)
  • Gemini 3.0 Flashleads in:General Knowledge (2/2), Claw-style Agent Evaluation (1/1), Math and Reasoning (1/1)

On average across the 6 shared benchmarks, GPT-5.4 mini scores 0.05 higher.

Largest single-benchmark gap: Terminal Bench 2.0 — GPT-5.4 mini 60 vs Gemini 3.0 Flash 47.60 (+12.40).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.