Gemini 3.5 FlashvsGPT-5.5
Across 9 shared benchmarks, GPT-5.5 leads overall: Gemini 3.5 Flash wins 2, GPT-5.5 wins 7, with 0 ties and an average score difference of -6.18.
Gemini 3.5 Flash
Google Deep Mind · 2026-06-20 · Multimodal model
GPT-5.5
OpenAI · 2026-04-23 · Reasoning model
Gemini 3.5 Flash2 wins(22%)(78%)7 winsGPT-5.5
Benchmark scores
Grouped by capability, sorted by largest gap within each. 9 shared benchmarks.
AI Agent - Tool Usage
GPT-5.5 2/3| Benchmark | Gemini 3.5 Flash | GPT-5.5 | Diff |
|---|---|---|---|
| MCP-Atlas | 83.601 / 23Thinking High (With Tools) | 75.309 / 23极高强度思考(工具) | +8.30 |
| TerminalBench 2.1 | 76.208 / 16Thinking High (With Tools) | 83.404 / 16Thinking High (With Tools) | -7.20 |
| OSWorld-Verified | 78.406 / 19Thinking High (With Tools) | 78.705 / 19Thinking High (With Tools) | -0.30 |
General Knowledge
GPT-5.5 3/3| Benchmark | Gemini 3.5 Flash | GPT-5.5 | Diff |
|---|---|---|---|
| ARC-AGI-2 | 72.1011 / 59Thinking High (With Tools) | 851 / 59Thinking High (No Tools) | -12.90 |
| HLE | 40.2055 / 161Thinking High (With Tools) | 52.2015 / 161Thinking High (With Tools) | -12 |
| LiveBench | 75.0217 / 115Thinking High (No Tools) | 80.711 / 115Deep Thinking (No Tools) | -5.69 |
Coding and Software Engineer
GPT-5.5 2/2| Benchmark | Gemini 3.5 Flash | GPT-5.5 | Diff |
|---|---|---|---|
| DeepSWE | 376 / 9Thinking Medium (With Tools) | 672 / 9极高强度思考(工具) | -30 |
| SWE-Bench Pro - Public | 55.1021 / 44Thinking High (With Tools) | 58.608 / 44Thinking High (With Tools) | -3.50 |
Math and Reasoning
Gemini 3.5 Flash 1/1| Benchmark | Gemini 3.5 Flash | GPT-5.5 | Diff |
|---|---|---|---|
| Simple Bench | 76.704 / 63Normal (No Tools) | 697 / 63Normal (No Tools) | +7.70 |
Specs
| Field | Gemini 3.5 Flash | GPT-5.5 |
|---|---|---|
| Publisher | Google Deep Mind | OpenAI |
| Release date | 2026-06-20 | 2026-04-23 |
| Model type | Multimodal model | Reasoning model |
| Architecture | Dense | Dense |
| Parameters | Not available | Not available |
| Context length | 1M | 1000K |
| Max output | 64K | 128K |
API pricing
Prices use DataLearner records when available; missing fields are not inferred.
| Item | Gemini 3.5 Flash | GPT-5.5 |
|---|---|---|
| Text input | $1.5 / 1M tokens | $0.5 / 1M tokens |
| Text output | $9 / 1M tokens | $30 / 1M tokens |
| Cache read | Not public | $0.5 / 1M tokens |
| Cache write | Not public | $6.25 / 1M tokens |
Summary
- Gemini 3.5 Flashleads in:Math and Reasoning (1/1)
- GPT-5.5leads in:AI Agent - Tool Usage (2/3), General Knowledge (3/3), Coding and Software Engineer (2/2)
On average across the 9 shared benchmarks, GPT-5.5 scores 6.18 higher.
Largest single-benchmark gap: DeepSWE — Gemini 3.5 Flash 37 vs GPT-5.5 67 (-30).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.