GPT-4o(2024-11-20)vsClaude3-Opus

Across 4 shared benchmarks, GPT-4o(2024-11-20) leads overall: GPT-4o(2024-11-20) wins 3, Claude3-Opus wins 1, with 0 ties and an average score difference of +5.51.

OpenAI
GPT-4o(2024-11-20)

OpenAI · 2024-11-20 · Chat model

Anthropic
Claude3-Opus

Anthropic · 2024-03-04 · Multimodal model

GPT-4o(2024-11-20)3 wins(75%)(25%)1 winClaude3-Opus

Benchmark scores

Grouped by capability, sorted by largest gap within each. 4 shared benchmarks.

General Knowledge

Even 2/2
BenchmarkGPT-4o(2024-11-20)Claude3-OpusDiff
MMLU Pro77.9072 / 12668.4595 / 126+9.45
MMLU85.7037 / 6586.8027 / 65-1.10

Coding and Software Engineer

GPT-4o(2024-11-20) 1/1
BenchmarkGPT-4o(2024-11-20)Claude3-OpusDiff
HumanEval90.207 / 3984.9021 / 39+5.30

Math and Reasoning

GPT-4o(2024-11-20) 1/1
BenchmarkGPT-4o(2024-11-20)Claude3-OpusDiff
MATH68.5024 / 4260.1031 / 42+8.40

Specs

FieldGPT-4o(2024-11-20)Claude3-Opus
PublisherOpenAIAnthropic
Release date2024-11-202024-03-04
Model typeChat modelMultimodal model
ArchitectureDenseDense
ParametersNot availableNot available
Context length128K200K
Max outputNot availableNot available

Summary

  • GPT-4o(2024-11-20)leads in:Coding and Software Engineer (1/1), Math and Reasoning (1/1)
  • Tied in:General Knowledge

On average across the 4 shared benchmarks, GPT-4o(2024-11-20) scores 5.51 higher.

Largest single-benchmark gap: MMLU Pro — GPT-4o(2024-11-20) 77.90 vs Claude3-Opus 68.45 (+9.45).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.