Claude3-Opus vs GPT-4 评测对比

See key specs and per-benchmark scores for each model/mode. Scroll horizontally for all columns. 当前对比 2 个模型的评测数据与核心参数。

Claude3-Opus

Anthropic

Release: 2024-03-04
Context length: 200K
Parameters: Not provided

Model profile

GPT-4

OpenAI

Capability profile

Each axis is a category average, normalized to a 100-point radar.

View: Non-parallel mode average·3 dimensions

Claude3-Opus

Relative edge: 编程与软件工程 +17.9 / Relative gap: none clear

GPT-4

Relative edge: none clear / Relative gap: 编程与软件工程 -17.9

Method: for each model and benchmark, the chart first averages all scores in the current mode scope instead of taking the best score, then averages those benchmark scores within each category. Only benchmarks with at least two selected models scored are included; missing values are not counted as zero.

Best overall

Claude3-Opus · 84.93

Best single

Claude3-Opus · MMLU 86.80

Modality coverage

Claude3-Opus · 0 modalities

Head to head

Claude3-Opus

GPT-4

AheadTiedBehind

Benchmarks

Wins

Losses

+6.83

Average diff

Performance benchmarks

Compare benchmark results across thinking modes and tool usage.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Filter: Best Available·2 modes · 3 Benchmark

图表加载中...

Benchmark score table

Complete scores for each model/mode across selected benchmarks.

3 benchmarks with comparable scores. Each model shows its best score; mode label is displayed below.

Benchmark	Claude3-Opus	GPT-4
MMLU 综合评估	86.80Standard Mode	86.40Standard Mode
HumanEval 编程与软件工程	84.90Standard Mode	67.00Standard Mode
DROP 阅读理解	83.10Standard Mode	80.90Standard Mode

API price comparison

Side-by-side input/output token pricing

Detailed feature breakdown

Licensing, MoE architecture, and multi-modality support.

Features & specs	Claude3-OpusAnthropic	GPT-4OpenAI
Core specsRelease	2024-03-04	2023-03-14
Context length	200K	128K
Parameters	—	1750
MoE	No	No
LicenseCode Open Source	Not provided	Not provided
Weights Open Source	Not provided	Not provided
Commercial use	不开源	不开源
ResourcesPaper / report	Introducing the next generation of Claude	GPT-4 Technical Report
DataLearner blog	评测结果超过GPT-4，Anthropic发布第三代大语言模型Claude3，具有多模态能力，实际评测表现优秀！	Not provided