Claude3-OpusvsGPT-4

Across 3 shared benchmarks, Claude3-Opus leads overall: Claude3-Opus wins 3, GPT-4 wins 0, with 0 ties and an average score difference of +6.83.

Anthropic
Claude3-Opus

Anthropic · 2024-03-04 · Multimodal model

OpenAI
GPT-4

OpenAI · 2023-03-14 · Foundation model

Claude3-Opus3 wins(100%)(0%)0 winsGPT-4

Benchmark scores

Grouped by capability, sorted by largest gap within each. 3 shared benchmarks.

Coding and Software Engineer

Claude3-Opus 1/1
BenchmarkClaude3-OpusGPT-4Diff
HumanEval84.9021 / 396727 / 39Normal (No Tools)+17.90

General Knowledge

Claude3-Opus 1/1
BenchmarkClaude3-OpusGPT-4Diff
MMLU86.8027 / 6586.4031 / 65Normal (No Tools)+0.40

Other Benchmarks

Claude3-Opus 1/1
BenchmarkClaude3-OpusGPT-4Diff
DROP83.106 / 980.907 / 9Normal (No Tools)+2.20

Specs

FieldClaude3-OpusGPT-4
PublisherAnthropicOpenAI
Release date2024-03-042023-03-14
Model typeMultimodal modelFoundation model
ArchitectureDenseDense
ParametersNot available175B
Context length200K128K
Max outputNot availableNot available

Summary

  • Claude3-Opusleads in:Coding and Software Engineer (1/1), General Knowledge (1/1), Other Benchmarks (1/1)

On average across the 3 shared benchmarks, Claude3-Opus scores 6.83 higher.

Largest single-benchmark gap: HumanEval — Claude3-Opus 84.90 vs GPT-4 67 (+17.90).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.