Claude3-OpusvsGPT-4
Across 3 shared benchmarks, Claude3-Opus leads overall: Claude3-Opus wins 3, GPT-4 wins 0, with 0 ties and an average score difference of +6.83.
Claude3-Opus
Anthropic · 2024-03-04 · Multimodal model
GPT-4
OpenAI · 2023-03-14 · Foundation model
Claude3-Opus3 wins(100%)(0%)0 winsGPT-4
Benchmark scores
Grouped by capability, sorted by largest gap within each. 3 shared benchmarks.
Coding and Software Engineer
Claude3-Opus 1/1| Benchmark | Claude3-Opus | GPT-4 | Diff |
|---|---|---|---|
| HumanEval | 84.9021 / 39 | 6727 / 39Normal (No Tools) | +17.90 |
General Knowledge
Claude3-Opus 1/1| Benchmark | Claude3-Opus | GPT-4 | Diff |
|---|---|---|---|
| MMLU | 86.8027 / 65 | 86.4031 / 65Normal (No Tools) | +0.40 |
Specs
| Field | Claude3-Opus | GPT-4 |
|---|---|---|
| Publisher | Anthropic | OpenAI |
| Release date | 2024-03-04 | 2023-03-14 |
| Model type | Multimodal model | Foundation model |
| Architecture | Dense | Dense |
| Parameters | 0.0 | 1750.0 |
| Context length | 200K | 128K |
| Max output | Not available | Not available |
Summary
- Claude3-Opusleads in:Coding and Software Engineer (1/1), General Knowledge (1/1), Other Benchmarks (1/1)
On average across the 3 shared benchmarks, Claude3-Opus scores 6.83 higher.
Largest single-benchmark gap: HumanEval — Claude3-Opus 84.90 vs GPT-4 67 (+17.90).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.