Claude3-OpusvsGPT-4

Across 3 shared benchmarks, Claude3-Opus leads overall: Claude3-Opus wins 3, GPT-4 wins 0, with 0 ties and an average score difference of +6.83.

Anthropic · 2024-03-04 · Multimodal model

OpenAI · 2023-03-14 · Foundation model

Claude3-Opus3 wins(100%)(0%)0 winsGPT-4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 3 shared benchmarks.

Claude3-Opus 1/1

Benchmark	Claude3-Opus	GPT-4	Diff
HumanEval	84.9021 / 39	6727 / 39Normal (No Tools)	+17.90

Claude3-Opus 1/1

Benchmark	Claude3-Opus	GPT-4	Diff
MMLU	86.8027 / 65	86.4031 / 65Normal (No Tools)	+0.40

Claude3-Opusleads in:Coding and Software Engineer (1/1), General Knowledge (1/1), Other Benchmarks (1/1)

On average across the 3 shared benchmarks, Claude3-Opus scores 6.83 higher.

Largest single-benchmark gap: HumanEval — Claude3-Opus 84.90 vs GPT-4 67 (+17.90).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.