GPT-4o(2024-11-20)vsGPT-4o

GPT-4o(2024-11-20) and GPT-4o are tied across 7 shared benchmarks: GPT-4o(2024-11-20) leads on 2, GPT-4o leads on 2, with 3 ties and an average score difference of -1.37.

OpenAI
GPT-4o(2024-11-20)

OpenAI · 2024-11-20 · Chat model

OpenAI
GPT-4o

OpenAI · 2024-05-13 · Multimodal model

GPT-4o(2024-11-20)2 wins(29%)Ties3(29%)2 winsGPT-4o

Benchmark scores

Grouped by capability, sorted by largest gap within each. 7 shared benchmarks.

Coding and Software Engineer

GPT-4o(2024-11-20) 1/2
BenchmarkGPT-4o(2024-11-20)GPT-4oDiff
HumanEval90.207 / 39908 / 39+0.20
SWE-bench Verified31103 / 108Normal (No Tools)31103 / 108

General Knowledge

GPT-4o 1/2
BenchmarkGPT-4o(2024-11-20)GPT-4oDiff
MMLU85.7037 / 6588.7015 / 65-3
MMLU Pro77.9072 / 12677.9072 / 126

Math and Reasoning

GPT-4o 1/2
BenchmarkGPT-4o(2024-11-20)GPT-4oDiff
MATH68.5024 / 4275.9016 / 42-7.40
FrontierMath0.3057 / 600.3057 / 60

Common Sense

GPT-4o(2024-11-20) 1/1
BenchmarkGPT-4o(2024-11-20)GPT-4oDiff
SimpleQA38.8019 / 4538.2020 / 45+0.60

Specs

FieldGPT-4o(2024-11-20)GPT-4o
PublisherOpenAIOpenAI
Release date2024-11-202024-05-13
Model typeChat modelMultimodal model
ArchitectureDenseDense
ParametersNot availableNot available
Context length128K128K
Max outputNot available16K

Summary

  • GPT-4o(2024-11-20)leads in:Coding and Software Engineer (1/2), Common Sense (1/1)
  • GPT-4oleads in:General Knowledge (1/2), Math and Reasoning (1/2)

On average across the 7 shared benchmarks, GPT-4o scores 1.37 higher.

Largest single-benchmark gap: MATH — GPT-4o(2024-11-20) 68.50 vs GPT-4o 75.90 (-7.40).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.