See key specs and per-benchmark scores for each model/mode. Scroll horizontally for all columns. 当前对比 2 个模型的评测数据与核心参数。

Claude Opus 4.8
Anthropic
Best overall
Claude Opus 4.8 · 71.90
Best single
Claude Opus 4.8 · SWE-bench Verified 88.60
Modality coverage
Claude Opus 4.8 · 1 modalities
Head to head
3
Benchmarks
3
Wins
0
Losses
+9.83
Average diff
Compare benchmark results across thinking modes and tool usage.
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
Complete scores for each model/mode across selected benchmarks.
3 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.
| Benchmark | Claude Opus 4.8 | Gemini 3.1 Pro Preview |
|---|---|---|
HLE 综合评估 | 57.90Extended Thinking | Tools | 51.40Thinking Level · High | Tools |
SWE-Bench Pro - Public 编程与软件工程 | 69.20Extended Thinking | Tools | 54.20Thinking Level · High | Tools |
SWE-bench Verified 编程与软件工程 | 88.60Extended Thinking | Tools | 80.60Thinking Level · High | Tools |
Side-by-side input/output token pricing
Licensing, MoE architecture, and multi-modality support.
| Features & specs | Claude Opus 4.8Anthropic | Gemini 3.1 Pro PreviewGoogle Deep Mind |
|---|---|---|
Core specsRelease | 2026-05-28 | 2026-02-20 |
Context length | 1M | 1M |
Max output | 128000 | 32768 |
MoE | No | No |
LicenseCode Open Source | Not provided | Not provided |
Weights Open Source | Not provided | Not provided |
Commercial use | 不开源 | 不开源 |
Modality supportText Input/Output | / | / |
ResourcesPaper / report | Introducing Claude Opus 4.8 | Gemini 3.1 Pro: A smarter model for your most complex tasks |
DataLearner blog | Anthropic发布Claude Opus 4.8:定价不变,编程与智能体能力小幅提升, | Not provided |

Gemini 3.1 Pro Preview
Google Deep Mind