Gemini 3.1 Pro PreviewvsClaude Opus 4.6

在 19 个共同 benchmark 中，Gemini 3.1 Pro Preview 整体领先：Gemini 3.1 Pro Preview 领先 12 项，Claude Opus 4.6 领先 6 项，持平 1 项，平均分差 +2.43。

Google Deep Mind · 2026-02-20 · 多模态大模型

Anthropic · 2026-02-05 · 推理大模型

Gemini 3.1 Pro Preview12 项(63%)持平1(32%)6 项Claude Opus 4.6

评测分数

按能力类目分组，每组内按分差大小排列；共 19 项。

Gemini 3.1 Pro Preview 领先 4/6

评测项	Gemini 3.1 Pro Preview	Claude Opus 4.6	分差
ARC-AGI-2	77.109 / 62Thinking High (No Tools)	66.3017 / 62Extended (no tools)	+10.80
LiveBench	79.933 / 115Thinking High (No Tools)	76.338 / 115Thinking High (No Tools)	+3.60
GPQA Diamond	94.303 / 187Thinking High (No Tools)	91.3115 / 187Extended (no tools)	+2.99
HLE	51.4022 / 172Thinking High (With Tools)	5318 / 172Extended (with tools, internet)	-1.60
MMLU	92.603 / 66Thinking High (No Tools)	91.057 / 66Extended (no tools)	+1.55
ARC-AGI-3	06 / 9Thinking High (No Tools)	04 / 9最高（无工具）	持平

Gemini 3.1 Pro Preview 领先 3/3

评测项	Gemini 3.1 Pro Preview	Claude Opus 4.6	分差
OSWorld-Verified	76.2011 / 24Thinking (With Tools)	72.7015 / 24Extended (with tools)	+3.50
Terminal Bench 2.0	68.508 / 47Thinking High (With Tools)	65.4011 / 47Extended (with tools)	+3.10
MCP-Atlas	78.209 / 27Thinking High (With Tools)	76.8010 / 27Deep Thinking (With Tools)	+1.40

胶着 2/2

评测项	Gemini 3.1 Pro Preview	Claude Opus 4.6	分差
τ²-Bench	90.802 / 43Thinking High (With Tools)	91.891 / 43Extended (with tools)	-1.09
τ²-Bench - Telecom	99.301 / 35Thinking High (With Tools)	99.252 / 35Extended (with tools)	+0.05

胶着 2/2

评测项	Gemini 3.1 Pro Preview	Claude Opus 4.6	分差
LiveCodeBench	91.703 / 123Thinking High (With Tools)	7638 / 123Extended (no tools)	+15.70
SWE-bench Verified	80.6011 / 112Thinking High (With Tools)	80.8410 / 112Extended (with tools)	-0.24

Claude Opus 4.6 领先 2/2

评测项	Gemini 3.1 Pro Preview	Claude Opus 4.6	分差
FrontierMath - Tier 4	16.7020 / 80Normal (No Tools)	22.9012 / 80最高（无工具）	-6.20
FrontierMath	36.9011 / 60Thinking High (No Tools)	40.707 / 60最高（无工具）	-3.80

Gemini 3.1 Pro Preview 领先 1/1

评测项	Gemini 3.1 Pro Preview	Claude Opus 4.6	分差
BrowseComp	85.905 / 53Thinking High (With Tools + Internet)	8411 / 53Thinking (With Tools + Internet)	+1.90

Claude Opus 4.6 领先 1/1

评测项	Gemini 3.1 Pro Preview	Claude Opus 4.6	分差
Pinch Bench	86.7010 / 37Thinking (With Tools)	87.407 / 37Thinking (With Tools)	-0.70

Gemini 3.1 Pro Preview 领先 1/1

评测项	Gemini 3.1 Pro Preview	Claude Opus 4.6	分差
MMMU	80.5012 / 29Thinking High (No Tools)	77.3016 / 29Extended (with tools)	+3.20

Gemini 3.1 Pro Preview 领先 1/1

评测项	Gemini 3.1 Pro Preview	Claude Opus 4.6	分差
Simple Bench	79.602 / 63Normal (No Tools)	67.608 / 63Normal (No Tools)	+12

价格优先使用 DataLearner 配置的 API 记录；缺失项不做推测。

Gemini 3.1 Pro Preview在以下类目领先:General Knowledge (4/6)、AI Agent - Tool Usage (3/3)、AI Agent - Information Search (1/1)、Multimodal Understanding (1/1)、常识推理 (1/1)
Claude Opus 4.6在以下类目领先:Math and Reasoning (2/2)、Claw-style Agent Evaluation (1/1)
胶着类目:Agent Level Benchmark、Coding and Software Engineer

19 个共同 benchmark 上，Gemini 3.1 Pro Preview 平均高出 2.43 分。

单项差距最大的 benchmark：LiveCodeBench — Gemini 3.1 Pro Preview 91.70，Claude Opus 4.6 76（分差 +15.70）。

本页正文由结构化模型、价格与 benchmark 数据生成，不使用实时 LLM 撰写。