DataLearner logoDataLearnerAI
Latest AI Insights
Model Leaderboards
Benchmarks
Model Directory
Model Comparison
Resource Center
Tools
LanguageEnglish
DataLearner logoDataLearner AI

A knowledge platform focused on LLM benchmarking, datasets, and practical instruction with continuously updated capability maps.

Products

  • Leaderboards
  • Model comparison
  • Datasets

Resources

  • Tutorials
  • Editorial
  • Tool directory

Company

  • About
  • Privacy policy
  • Data methodology
  • Contact

© 2026 DataLearner AI. DataLearner curates industry data and case studies so researchers, enterprises, and developers can rely on trustworthy intelligence.

Privacy policyTerms of service
Page navigation
目录
Model catalogClaude Opus 4.7Benchmark analysis

Claude Opus 4.7 Benchmark Details

Claude Opus 4.7 currently shows benchmark results led by SWE-bench Verified (2 / 96, score 87.60), GPQA Diamond (4 / 166, score 94.20), HLE (5 / 131, score 54.70). This page also compares it with 2 competitor models and 3 predecessor or same-series models, including performance and pricing views when available. 1 source link is attached for reference.

Benchmark Results

Claude Opus 4.7

Benchmark Results

Thinking
All modesNormalThinking
Thinking mode details (1)
All thinking modesDefault (Extended)
Tool usage
All modesWith toolsNo tools

综合评估

3 evaluations
Benchmark / mode
Score
Rank/total
GPQA Diamond
Extended
94.20
4 / 166
HLE
Extended
46.90
20 / 131
HLE
ExtendedTools
54.70
5 / 131

编程与软件工程

2 evaluations
Benchmark / mode
Score
Rank/total
SWE-bench Verified
ExtendedTools
87.60
2 / 96
SWE-Bench Pro - Public
ExtendedTools
64.30
2 / 26

AI Agent - 信息收集

1 evaluations
Benchmark / mode
Score
Rank/total
BrowseComp
ExtendedTools
79.30
6 / 36

AI Agent - 工具使用

2 evaluations
Benchmark / mode
Score
Rank/total
OSWorld-Verified
ExtendedTools
78
2 / 12
Terminal Bench 2.0
ExtendedTools
69.40
4 / 33
Compare with other models

Competitor Comparison

Benchmark scores for Claude Opus 4.7 compared against top models in its class

Claude Opus 4.7(Current model)GPT-5.4Gemini 3.1 Pro Preview
Benchmark categories:
Claude Opus 4.7:
Extended + Tool
Extended
GPT-5.4:
Extra-High + Tool
Extra-High
Gemini 3.1 Pro Preview:
High + Tool
High + Tool
High

Benchmark Score Comparison

8 benchmarks with comparable scores. Each cell shows the best visible mode for that benchmark.

BenchmarkClaude Opus 4.7(This model)GPT-5.4Gemini 3.1 Pro Preview
GPQA Diamond
综合评估
94.20
Extended Thinking
92.80
Thinking Level · Extra High
94.30
Thinking Level · High
HLE
综合评估
54.70
Extended ThinkingTools
52.10
Thinking Level · Extra HighTools
51.40
Thinking Level · HighTools
MMLU
综合评估
91.50
Normal
--
92.60
Thinking Level · High
SWE-Bench Pro - Public
编程与软件工程
64.30
Extended ThinkingTools
57.70
Thinking Level · Extra High
54.20
Thinking Level · HighTools
SWE-bench Verified
编程与软件工程
87.60
Extended ThinkingTools
--
80.60
Thinking Level · HighTools
BrowseComp
AI Agent - 信息收集
79.30
Extended ThinkingTools
82.70
Thinking Level · Extra HighTools
85.90
Thinking Level · HighToolsInternet
OSWorld-Verified
AI Agent - 工具使用
78.00
Extended ThinkingTools
75.00
Thinking Level · Extra HighTools
--
Terminal Bench 2.0
AI Agent - 工具使用
69.40
Extended ThinkingTools
75.10
Thinking Level · Extra HighTools
68.50
Thinking Level · HighTools

Standard API Pricing: Claude Opus 4.7 vs. Peer Models

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

GPT-5.4: Base price applies to <= 272K
Gemini 3.1 Pro Preview: Base price applies to <= 200K
ModelSupplierStandard inputStandard outputBase price applies to
GPT-5.4
OpenAI$2.5 / 1M tokens$15 / 1M tokens<= 272K
Gemini 3.1 Pro Preview
Google Deep Mind$2 / 1M tokens$12 / 1M tokens<= 200K

Version History

How each version of the Claude Opus 4.7 series stacks up on benchmark tests

Claude Opus 4.7(Current model)Claude Opus 4.6Claude Opus 4.5Claude Opus 4.1
Benchmark categories:
Claude Opus 4.7:
Extended + Tool
Extended
Claude Opus 4.6:
Extended + Tool
Extended + Tool
Extended
Claude Opus 4.5:
thinking
thinking + 使用工具 + Tool
Claude Opus 4.1:
thinking

Benchmark Score Comparison

7 benchmarks with comparable scores. Each cell shows the best visible mode for that benchmark.

BenchmarkClaude Opus 4.7(This model)Claude Opus 4.6Claude Opus 4.5Claude Opus 4.1
GPQA Diamond
综合评估
94.20
Extended Thinking
91.31
Extended Thinking
87.00
Thinking
81.00
Thinking
HLE
综合评估
54.70
Extended ThinkingTools
53.00
Extended ThinkingToolsInternet
43.20
ThinkingTools
--
MMLU
综合评估
91.50
Normal
91.05
Extended Thinking
--
--
SWE-bench Verified
编程与软件工程
87.60
Extended ThinkingTools
80.84
Extended ThinkingTools
80.90
Thinking
79.40
Parallel · ThinkingTools
BrowseComp
AI Agent - 信息收集
79.30
Extended ThinkingTools
84.00
ThinkingToolsInternet
--
--
OSWorld-Verified
AI Agent - 工具使用
78.00
Extended ThinkingTools
72.70
Extended ThinkingTools
--
--
Terminal Bench 2.0
AI Agent - 工具使用
69.40
Extended ThinkingTools
65.40
Extended ThinkingTools
59.30
ThinkingTools
--

Standard API Pricing Across the Claude Opus 4.7 Series

Shows standard text input and output pricing side by side for each model. If extended-context pricing exists, the chart keeps the base rate and explains the threshold below.

Source: DataLearnerAI. Standard text prices shown here use the default supplier. · USD / 1M tokens

When a context threshold exists, the charted base price only applies within these limits:

Claude Opus 4.6: Base price applies to <= 200K
ModelSupplierStandard inputStandard outputBase price applies to
Claude Opus 4.6
Anthropic$5 / 1M tokens$25 / 1M tokens<= 200K
Claude Opus 4.5
—5 美元/100 万tokens25 美元/100 万tokens—
Claude Opus 4.1
—15 美元/ 100万tokens75 美元/100万tokens—

Series Overview

See how each version of the Claude Opus 4.7 series performs across major benchmarks. Click any row to break down scores by reasoning mode.

Tip: click any score cell to switch the chart below.

Default view shows benchmarks with data coverage > 60% (4/7)

Benchmark
Claude Opus 4.18/6/2025
Claude Opus 4.511/25/2025
Claude Opus 4.62/5/2026
Claude Opus 4.74/16/2026
综合评估
综合评估
编程与软件工程
AI Agent - 工具使用

Single-Benchmark Mode Relation

Viewing: GPQA Diamond · 综合评估

Benchmark
NormalNormal + ToolsThinkingThinking + ToolsDeepDeep + Tools

X-axis shows model and release date, Y-axis shows score; dotted lines connect modes within the same generation.

Sources

anthropic.com