Claude Opus 4.7 Benchmark Details

Claude Opus 4.7 currently shows benchmark results led by SWE-bench Verified (2 / 96, score 87.60), GPQA Diamond (4 / 166, score 94.20), HLE (5 / 131, score 54.70). This page also compares it with 2 competitor models and 3 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

Claude Opus 4.7

Benchmark Results

综合评估

3 evaluations

Benchmark / mode

Score

Rank/total

GPQA Diamond

Extended

94.20

4 / 166

HLE

Extended

46.90

20 / 131

HLE

ExtendedTools

54.70

5 / 131

编程与软件工程

2 evaluations

Benchmark / mode

Score

Rank/total

SWE-bench Verified

ExtendedTools

87.60

2 / 96

SWE-Bench Pro - Public

ExtendedTools

64.30

2 / 26

AI Agent - 信息收集

1 evaluations

Benchmark / mode

Score

Rank/total

BrowseComp

ExtendedTools

79.30

6 / 36

AI Agent - 工具使用

2 evaluations

Benchmark / mode

Score

Rank/total

OSWorld-Verified

ExtendedTools

2 / 12

Terminal Bench 2.0

ExtendedTools

69.40

4 / 33

Compare with other models

Competitor Comparison

Benchmark scores for Claude Opus 4.7 compared against top models in its class

Claude Opus 4.7(Current model)GPT-5.4 Gemini 3.1 Pro Preview

Benchmark categories:

Claude Opus 4.7:

Extended + Tool

Extended

GPT-5.4:

Extra-High + Tool

Extra-High

Gemini 3.1 Pro Preview:

High + Tool

High

Benchmark Score Comparison

8 benchmarks with comparable scores. Each cell shows the best visible mode for that benchmark.

Benchmark	Claude Opus 4.7(This model)	GPT-5.4	Gemini 3.1 Pro Preview
GPQA Diamond 综合评估	94.20 Extended Thinking	92.80 Thinking Level · Extra High	94.30 Thinking Level · High
HLE 综合评估	54.70 Extended ThinkingTools	52.10 Thinking Level · Extra HighTools	51.40 Thinking Level · HighTools
MMLU 综合评估	91.50 Normal	--	92.60 Thinking Level · High
SWE-Bench Pro - Public 编程与软件工程	64.30 Extended ThinkingTools	57.70 Thinking Level · Extra High	54.20 Thinking Level · HighTools
SWE-bench Verified 编程与软件工程	87.60 Extended ThinkingTools	--	80.60 Thinking Level · HighTools
BrowseComp AI Agent - 信息收集	79.30 Extended ThinkingTools	82.70 Thinking Level · Extra HighTools	85.90 Thinking Level · HighToolsInternet
OSWorld-Verified AI Agent - 工具使用	78.00 Extended ThinkingTools	75.00 Thinking Level · Extra HighTools	--
Terminal Bench 2.0 AI Agent - 工具使用	69.40 Extended ThinkingTools	75.10 Thinking Level · Extra HighTools	68.50 Thinking Level · HighTools

Standard API Pricing: Claude Opus 4.7 vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

GPT-5.4: Base price applies to <= 272K

Gemini 3.1 Pro Preview: Base price applies to <= 200K

Model	Supplier	Standard input	Standard output	Base price applies to
GPT-5.4	OpenAI	$2.5 / 1M tokens	$15 / 1M tokens	<= 272K
Gemini 3.1 Pro Preview	Google Deep Mind	$2 / 1M tokens	$12 / 1M tokens	<= 200K

Version History

How each version of the Claude Opus 4.7 series stacks up on benchmark tests

Claude Opus 4.7(Current model)Claude Opus 4.6 Claude Opus 4.5 Claude Opus 4.1

Benchmark categories:

Claude Opus 4.7:

Extended + Tool

Extended

Claude Opus 4.6:

Extended + Tool

Extended

Claude Opus 4.5:

thinking

thinking + 使用工具 + Tool

Claude Opus 4.1:

thinking

Benchmark Score Comparison

7 benchmarks with comparable scores. Each cell shows the best visible mode for that benchmark.

Benchmark	Claude Opus 4.7(This model)	Claude Opus 4.6	Claude Opus 4.5	Claude Opus 4.1
GPQA Diamond 综合评估	94.20 Extended Thinking	91.31 Extended Thinking	87.00 Thinking	81.00 Thinking
HLE 综合评估	54.70 Extended ThinkingTools	53.00 Extended ThinkingToolsInternet	43.20 ThinkingTools	--
MMLU 综合评估	91.50 Normal	91.05 Extended Thinking	--	--
SWE-bench Verified 编程与软件工程	87.60 Extended ThinkingTools	80.84 Extended ThinkingTools	80.90 Thinking	79.40 Parallel · ThinkingTools
BrowseComp AI Agent - 信息收集	79.30 Extended ThinkingTools	84.00 ThinkingToolsInternet	--	--
OSWorld-Verified AI Agent - 工具使用	78.00 Extended ThinkingTools	72.70 Extended ThinkingTools	--	--
Terminal Bench 2.0 AI Agent - 工具使用	69.40 Extended ThinkingTools	65.40 Extended ThinkingTools	59.30 ThinkingTools	--

Standard API Pricing Across the Claude Opus 4.7 Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

Claude Opus 4.6: Base price applies to <= 200K

Model	Supplier	Standard input	Standard output	Base price applies to
Claude Opus 4.6	Anthropic	$5 / 1M tokens	$25 / 1M tokens	<= 200K
Claude Opus 4.5	—	5 美元/100 万tokens	25 美元/100 万tokens	—
Claude Opus 4.1	—	15 美元/ 100万tokens	75 美元/100万tokens	—

Series Overview

See how each version of the Claude Opus 4.7 series performs across major benchmarks. Click any row to break down scores by reasoning mode.

Tip: click any score cell to switch the chart below.

Benchmark	Claude Opus 4.18/6/2025	Claude Opus 4.511/25/2025	Claude Opus 4.62/5/2026	Claude Opus 4.74/16/2026
综合评估
综合评估
编程与软件工程
AI Agent - 工具使用

Single-Benchmark Mode Relation

Viewing: GPQA Diamond · 综合评估

Benchmark

NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; dotted lines connect modes within the same generation.

Sources

anthropic.com