GPT-4o(2024-11-20)vsGPT-4o
GPT-4o(2024-11-20) and GPT-4o are tied across 7 shared benchmarks: GPT-4o(2024-11-20) leads on 2, GPT-4o leads on 2, with 3 ties and an average score difference of -1.37.
GPT-4o(2024-11-20)
OpenAI · 2024-11-20 · Chat model
GPT-4o
OpenAI · 2024-05-13 · Multimodal model
GPT-4o(2024-11-20)2 wins(29%)Ties3(29%)2 winsGPT-4o
Benchmark scores
Grouped by capability, sorted by largest gap within each. 7 shared benchmarks.
Coding and Software Engineer
GPT-4o(2024-11-20) 1/2| Benchmark | GPT-4o(2024-11-20) | GPT-4o | Diff |
|---|---|---|---|
| HumanEval | 90.207 / 39 | 908 / 39 | +0.20 |
| SWE-bench Verified | 31103 / 108Normal (No Tools) | 31103 / 108 | — |
General Knowledge
GPT-4o 1/2| Benchmark | GPT-4o(2024-11-20) | GPT-4o | Diff |
|---|---|---|---|
| MMLU | 85.7037 / 65 | 88.7015 / 65 | -3 |
| MMLU Pro | 77.9072 / 126 | 77.9072 / 126 | — |
Math and Reasoning
GPT-4o 1/2| Benchmark | GPT-4o(2024-11-20) | GPT-4o | Diff |
|---|---|---|---|
| MATH | 68.5024 / 42 | 75.9016 / 42 | -7.40 |
| FrontierMath | 0.3057 / 60 | 0.3057 / 60 | — |
Common Sense
GPT-4o(2024-11-20) 1/1| Benchmark | GPT-4o(2024-11-20) | GPT-4o | Diff |
|---|---|---|---|
| SimpleQA | 38.8019 / 45 | 38.2020 / 45 | +0.60 |
Specs
| Field | GPT-4o(2024-11-20) | GPT-4o |
|---|---|---|
| Publisher | OpenAI | OpenAI |
| Release date | 2024-11-20 | 2024-05-13 |
| Model type | Chat model | Multimodal model |
| Architecture | Dense | Dense |
| Parameters | Not available | Not available |
| Context length | 128K | 128K |
| Max output | Not available | 16K |
Summary
- GPT-4o(2024-11-20)leads in:Coding and Software Engineer (1/2), Common Sense (1/1)
- GPT-4oleads in:General Knowledge (1/2), Math and Reasoning (1/2)
On average across the 7 shared benchmarks, GPT-4o scores 1.37 higher.
Largest single-benchmark gap: MATH — GPT-4o(2024-11-20) 68.50 vs GPT-4o 75.90 (-7.40).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.