DeepSeek-V3vsGPT-4o(2024-11-20)
DeepSeek-V3 and GPT-4o(2024-11-20) are tied across 6 shared benchmarks: DeepSeek-V3 leads on 3, GPT-4o(2024-11-20) leads on 3, with 0 ties and an average score difference of +1.07.
DeepSeek-V3
DeepSeek-AI · 2024-12-26 · AI model
GPT-4o(2024-11-20)
OpenAI · 2024-11-20 · AI model
DeepSeek-V33 wins(50%)(50%)3 winsGPT-4o(2024-11-20)
Benchmark scores
Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.
General Knowledge
Even 2/2| Benchmark | DeepSeek-V3 | GPT-4o(2024-11-20) | Diff |
|---|---|---|---|
| MMLU | 88.5017 / 65 | 85.7037 / 65 | +2.80 |
| MMLU Pro | 75.9078 / 124 | 77.9070 / 124 | -2 |
Math and Reasoning
DeepSeek-V3 2/2| Benchmark | DeepSeek-V3 | GPT-4o(2024-11-20) | Diff |
|---|---|---|---|
| MATH |
Specs
| Field | DeepSeek-V3 | GPT-4o(2024-11-20) |
|---|---|---|
| Publisher | DeepSeek-AI | OpenAI |
| Release date | 2024-12-26 | 2024-11-20 |
| Model type | AI model | AI model |
| Architecture | Dense | Dense |
| Parameters | 6810.0 | Not available |
| Context length | 128K | 128K |
| Max output | Not available | Not available |
Summary
- DeepSeek-V3leads in:Math and Reasoning (2/2)
- GPT-4o(2024-11-20)leads in:Coding and Software Engineer (1/1), Common Sense (1/1)
- Tied in:General Knowledge
On average across the 6 shared benchmarks, DeepSeek-V3 scores 1.07 higher.
Largest single-benchmark gap: MATH — DeepSeek-V3 87.80 vs GPT-4o(2024-11-20) 68.50 (+19.30).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.