Gemini 3.5 FlashvsGPT-5.5

Across 9 shared benchmarks, GPT-5.5 leads overall: Gemini 3.5 Flash wins 2, GPT-5.5 wins 7, with 0 ties and an average score difference of -6.18.

Google Deep Mind
Gemini 3.5 Flash

Google Deep Mind · 2026-06-20 · Multimodal model

OpenAI
GPT-5.5

OpenAI · 2026-04-23 · Reasoning model

Gemini 3.5 Flash2 wins(22%)(78%)7 winsGPT-5.5

Benchmark scores

Grouped by capability, sorted by largest gap within each. 9 shared benchmarks.

AI Agent - Tool Usage

GPT-5.5 2/3
BenchmarkGemini 3.5 FlashGPT-5.5Diff
MCP-Atlas83.601 / 23Thinking High (With Tools)75.309 / 23极高强度思考(工具)+8.30
TerminalBench 2.176.208 / 16Thinking High (With Tools)83.404 / 16Thinking High (With Tools)-7.20
OSWorld-Verified78.406 / 19Thinking High (With Tools)78.705 / 19Thinking High (With Tools)-0.30

General Knowledge

GPT-5.5 3/3
BenchmarkGemini 3.5 FlashGPT-5.5Diff
ARC-AGI-272.1011 / 59Thinking High (With Tools)851 / 59Thinking High (No Tools)-12.90
HLE40.2055 / 161Thinking High (With Tools)52.2015 / 161Thinking High (With Tools)-12
LiveBench75.0217 / 115Thinking High (No Tools)80.711 / 115Deep Thinking (No Tools)-5.69

Coding and Software Engineer

GPT-5.5 2/2
BenchmarkGemini 3.5 FlashGPT-5.5Diff
DeepSWE376 / 9Thinking Medium (With Tools)672 / 9极高强度思考(工具)-30
SWE-Bench Pro - Public55.1021 / 44Thinking High (With Tools)58.608 / 44Thinking High (With Tools)-3.50

Math and Reasoning

Gemini 3.5 Flash 1/1
BenchmarkGemini 3.5 FlashGPT-5.5Diff
Simple Bench76.704 / 63Normal (No Tools)697 / 63Normal (No Tools)+7.70

Specs

FieldGemini 3.5 FlashGPT-5.5
PublisherGoogle Deep MindOpenAI
Release date2026-06-202026-04-23
Model typeMultimodal modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1M1000K
Max output64K128K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemGemini 3.5 FlashGPT-5.5
Text input$1.5 / 1M tokens$0.5 / 1M tokens
Text output$9 / 1M tokens$30 / 1M tokens
Cache readNot public$0.5 / 1M tokens
Cache writeNot public$6.25 / 1M tokens

Summary

  • Gemini 3.5 Flashleads in:Math and Reasoning (1/1)
  • GPT-5.5leads in:AI Agent - Tool Usage (2/3), General Knowledge (3/3), Coding and Software Engineer (2/2)

On average across the 9 shared benchmarks, GPT-5.5 scores 6.18 higher.

Largest single-benchmark gap: DeepSWE — Gemini 3.5 Flash 37 vs GPT-5.5 67 (-30).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.