GPT-4o(2024-11-20)vsClaude3-Opus
Across 4 shared benchmarks, GPT-4o(2024-11-20) leads overall: GPT-4o(2024-11-20) wins 3, Claude3-Opus wins 1, with 0 ties and an average score difference of +5.51.
GPT-4o(2024-11-20)
OpenAI · 2024-11-20 · AI model
Claude3-Opus
Anthropic · 2024-03-04 · Multimodal model
GPT-4o(2024-11-20)3 wins(75%)(25%)1 winClaude3-Opus
Benchmark scores
Grouped by capability, sorted by largest gap within each. 4 shared benchmarks.
General Knowledge
Even 2/2| Benchmark | GPT-4o(2024-11-20) | Claude3-Opus | Diff |
|---|---|---|---|
| MMLU Pro | 77.9070 / 124 | 68.4593 / 124 | +9.45 |
| MMLU | 85.7037 / 65 | 86.8027 / 65 | -1.10 |
Coding and Software Engineer
GPT-4o(2024-11-20) 1/1| Benchmark | GPT-4o(2024-11-20) | Claude3-Opus | Diff |
|---|---|---|---|
| HumanEval |
Specs
| Field | GPT-4o(2024-11-20) | Claude3-Opus |
|---|---|---|
| Publisher | OpenAI | Anthropic |
| Release date | 2024-11-20 | 2024-03-04 |
| Model type | AI model | Multimodal model |
| Architecture | Dense | Dense |
| Parameters | Not available | 0.0 |
| Context length | 128K | 200K |
| Max output | Not available | Not available |
Summary
- GPT-4o(2024-11-20)leads in:Coding and Software Engineer (1/1), Math and Reasoning (1/1)
- Tied in:General Knowledge
On average across the 4 shared benchmarks, GPT-4o(2024-11-20) scores 5.51 higher.
Largest single-benchmark gap: MMLU Pro — GPT-4o(2024-11-20) 77.90 vs Claude3-Opus 68.45 (+9.45).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.