Gemini 3.5 FlashvsOpus 4.7

Gemini 3.5 Flash and Opus 4.7 are tied across 8 shared benchmarks: Gemini 3.5 Flash leads on 4, Opus 4.7 leads on 4, with 0 ties and an average score difference of -0.36.

Google Deep Mind
Gemini 3.5 Flash

Google Deep Mind · 2026-06-20 · Multimodal model

Anthropic
Opus 4.7

Anthropic · 2026-04-16 · Reasoning model

Gemini 3.5 Flash4 wins(50%)(50%)4 winsOpus 4.7

Benchmark scores

Grouped by capability, sorted by largest gap within each. 8 shared benchmarks.

AI Agent - Tool Usage

Gemini 3.5 Flash 3/3
BenchmarkGemini 3.5 FlashOpus 4.7Diff
TerminalBench 2.176.208 / 16Thinking High (With Tools)69.7011 / 16Thinking High (With Tools)+6.50
MCP-Atlas83.601 / 23Thinking High (With Tools)79.105 / 23Deep Thinking (With Tools)+4.50
OSWorld-Verified78.406 / 19Thinking High (With Tools)787 / 19Extended (with tools)+0.40

General Knowledge

Opus 4.7 3/3
BenchmarkGemini 3.5 FlashOpus 4.7Diff
HLE40.2055 / 161Thinking High (With Tools)54.709 / 161Extended (with tools)-14.50
ARC-AGI-272.1011 / 59Thinking High (With Tools)75.809 / 59最高(无工具)-3.70
LiveBench75.0217 / 115Thinking High (No Tools)76.917 / 115Deep Thinking (No Tools)-1.89

Coding and Software Engineer

Opus 4.7 1/1
BenchmarkGemini 3.5 FlashOpus 4.7Diff
SWE-Bench Pro - Public55.1021 / 44Thinking High (With Tools)64.304 / 44Extended (with tools)-9.20

Math and Reasoning

Gemini 3.5 Flash 1/1
BenchmarkGemini 3.5 FlashOpus 4.7Diff
Simple Bench76.704 / 63Normal (No Tools)61.7013 / 63Normal (No Tools)+15

Specs

FieldGemini 3.5 FlashOpus 4.7
PublisherGoogle Deep MindAnthropic
Release date2026-06-202026-04-16
Model typeMultimodal modelReasoning model
ArchitectureDenseDense
ParametersNot availableNot available
Context length1M1000K
Max output64K128K

API pricing

Prices use DataLearner records when available; missing fields are not inferred.

ItemGemini 3.5 FlashOpus 4.7
Text input$1.5 / 1M tokens$5 / 1M tokens
Text output$9 / 1M tokens$25 / 1M tokens
Cache readNot public$0.5 / 1M tokens
Cache writeNot public$6.25 / 1M tokens

Summary

  • Gemini 3.5 Flashleads in:AI Agent - Tool Usage (3/3), Math and Reasoning (1/1)
  • Opus 4.7leads in:General Knowledge (3/3), Coding and Software Engineer (1/1)

On average across the 8 shared benchmarks, Opus 4.7 scores 0.36 higher.

Largest single-benchmark gap: Simple Bench — Gemini 3.5 Flash 76.70 vs Opus 4.7 61.70 (+15).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.