Claude Opus 4.7 Benchmark Details
Claude Opus 4.7 currently shows benchmark results led by SWE-bench Verified (2 / 96, score 87.60), GPQA Diamond (4 / 166, score 94.20), HLE (5 / 131, score 54.70). This page also compares it with 2 competitor models and 3 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.
Benchmark Results
Benchmark Results
综合评估
3 evaluations编程与软件工程
2 evaluationsAI Agent - 工具使用
2 evaluationsCompetitor Comparison
Benchmark scores for Claude Opus 4.7 compared against top models in its class
Benchmark Score Comparison
8 benchmarks with comparable scores. Each cell shows the best visible mode for that benchmark.
| Benchmark | Claude Opus 4.7(This model) | GPT-5.4 | Gemini 3.1 Pro Preview |
|---|---|---|---|
GPQA Diamond 综合评估 | 94.20 Extended Thinking | 92.80 Thinking Level · Extra High | 94.30 Thinking Level · High |
HLE 综合评估 | 54.70 Extended ThinkingTools | 52.10 Thinking Level · Extra HighTools | 51.40 Thinking Level · HighTools |
MMLU 综合评估 | 91.50 Normal | -- | 92.60 Thinking Level · High |
SWE-Bench Pro - Public 编程与软件工程 | 64.30 Extended ThinkingTools | 57.70 Thinking Level · Extra High | 54.20 Thinking Level · HighTools |
SWE-bench Verified 编程与软件工程 | 87.60 Extended ThinkingTools | -- | 80.60 Thinking Level · HighTools |
BrowseComp AI Agent - 信息收集 | 79.30 Extended ThinkingTools | 82.70 Thinking Level · Extra HighTools | 85.90 Thinking Level · HighToolsInternet |
OSWorld-Verified AI Agent - 工具使用 | 78.00 Extended ThinkingTools | 75.00 Thinking Level · Extra HighTools | -- |
Terminal Bench 2.0 AI Agent - 工具使用 | 69.40 Extended ThinkingTools | 75.10 Thinking Level · Extra HighTools | 68.50 Thinking Level · HighTools |
Standard API Pricing: Claude Opus 4.7 vs. Peer Models
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
When a context threshold exists, the charted base price only applies within these limits:
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
GPT-5.4 | OpenAI | $2.5 / 1M tokens | $15 / 1M tokens | <= 272K |
Gemini 3.1 Pro Preview | Google Deep Mind | $2 / 1M tokens | $12 / 1M tokens | <= 200K |
Version History
How each version of the Claude Opus 4.7 series stacks up on benchmark tests
Benchmark Score Comparison
7 benchmarks with comparable scores. Each cell shows the best visible mode for that benchmark.
| Benchmark | Claude Opus 4.7(This model) | Claude Opus 4.6 | Claude Opus 4.5 | Claude Opus 4.1 |
|---|---|---|---|---|
GPQA Diamond 综合评估 | 94.20 Extended Thinking | 91.31 Extended Thinking | 87.00 Thinking | 81.00 Thinking |
HLE 综合评估 | 54.70 Extended ThinkingTools | 53.00 Extended ThinkingToolsInternet | 43.20 ThinkingTools | -- |
MMLU 综合评估 | 91.50 Normal | 91.05 Extended Thinking | -- | -- |
SWE-bench Verified 编程与软件工程 | 87.60 Extended ThinkingTools | 80.84 Extended ThinkingTools | 80.90 Thinking | 79.40 Parallel · ThinkingTools |
BrowseComp AI Agent - 信息收集 | 79.30 Extended ThinkingTools | 84.00 ThinkingToolsInternet | -- | -- |
OSWorld-Verified AI Agent - 工具使用 | 78.00 Extended ThinkingTools | 72.70 Extended ThinkingTools | -- | -- |
Terminal Bench 2.0 AI Agent - 工具使用 | 69.40 Extended ThinkingTools | 65.40 Extended ThinkingTools | 59.30 ThinkingTools | -- |
Standard API Pricing Across the Claude Opus 4.7 Series
Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.
Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens
When a context threshold exists, the charted base price only applies within these limits:
| Model | Supplier | Standard input | Standard output | Base price applies to |
|---|---|---|---|---|
Claude Opus 4.6 | Anthropic | $5 / 1M tokens | $25 / 1M tokens | <= 200K |
Claude Opus 4.5 | — | 5 美元/100 万tokens | 25 美元/100 万tokens | — |
Claude Opus 4.1 | — | 15 美元/ 100万tokens | 75 美元/100万tokens | — |
Series Overview
See how each version of the Claude Opus 4.7 series performs across major benchmarks. Click any row to break down scores by reasoning mode.
Tip: click any score cell to switch the chart below.
| Benchmark | Claude Opus 4.18/6/2025 | Claude Opus 4.511/25/2025 | Claude Opus 4.62/5/2026 | Claude Opus 4.74/16/2026 |
|---|---|---|---|---|
Single-Benchmark Mode Relation
Viewing: GPQA Diamond · 综合评估