Gemini 3.0 FlashvsHaiku 4.5

在 10 个共同 benchmark 中,Gemini 3.0 Flash 整体领先:Gemini 3.0 Flash 领先 9 项,Haiku 4.5 领先 1 项,持平 0 项,平均分差 +23.91。

Google Deep Mind
Gemini 3.0 Flash

Google Deep Mind · 2025-12-17 · 聊天大模型

Anthropic
Haiku 4.5

Anthropic · 2025-10-15 · 多模态大模型

Gemini 3.0 Flash9 (90%)(10%)1 Haiku 4.5

评测分数

按能力类目分组,每组内按分差大小排列;共 10 项。

General Knowledge

Gemini 3.0 Flash 领先 3/3
评测项Gemini 3.0 FlashHaiku 4.5分差
HLE43.5038 / 1574.30155 / 157Normal (No Tools)+39.20
ARC-AGI-233.6027 / 591.3052 / 59Normal (No Tools)+32.30
GPQA Diamond90.4017 / 17860.50138 / 178Normal (No Tools)+29.90

Claw-style Agent Evaluation

胶着 2/2
评测项Gemini 3.0 FlashHaiku 4.5分差
Claw Bench85.7015 / 29Thinking (With Tools)89.4011 / 29Thinking (With Tools)-3.70
Pinch Bench85.2016 / 37Thinking (With Tools)8221 / 37Thinking (With Tools)+3.20

Coding and Software Engineer

Gemini 3.0 Flash 领先 2/2
评测项Gemini 3.0 FlashHaiku 4.5分差
SWE-Bench Pro - Public49.6032 / 43Thinking High (With Tools)39.4540 / 43Extended (with tools)+10.15
SWE-bench Verified68.7062 / 10860.6076 / 108Normal (With Tools)+8.10

Math and Reasoning

Gemini 3.0 Flash 领先 2/2
评测项Gemini 3.0 FlashHaiku 4.5分差
AIME202599.708 / 1063994 / 106Normal (No Tools)+60.70
FrontierMath - Tier 44.2040 / 80Normal (No Tools)2.1056 / 80Thinking (No Tools, 32K Budget)+2.10

Agent Level Benchmark

Gemini 3.0 Flash 领先 1/1
评测项Gemini 3.0 FlashHaiku 4.5分差
τ²-Bench90.203 / 403340 / 40Normal (With Tools)+57.20

规格对比

字段Gemini 3.0 FlashHaiku 4.5
发布机构Google Deep MindAnthropic
发布时间2025-12-172025-10-15
模型类型聊天大模型多模态大模型
架构稠密模型稠密模型
参数规模暂无数据暂无数据
上下文长度2000K200K
最大输出64K64K

小结

  • Gemini 3.0 Flash在以下类目领先:General Knowledge (3/3)、Coding and Software Engineer (2/2)、Math and Reasoning (2/2)、Agent Level Benchmark (1/1)
  • 胶着类目:Claw-style Agent Evaluation

10 个共同 benchmark 上,Gemini 3.0 Flash 平均高出 23.91 分。

单项差距最大的 benchmark:AIME2025 — Gemini 3.0 Flash 99.70,Haiku 4.5 39(分差 +60.70)。

本页正文由结构化模型、价格与 benchmark 数据生成,不使用实时 LLM 撰写。