DeepSeek-V3vsGPT-4o(2024-11-20)
DeepSeek-V3 and GPT-4o(2024-11-20) are tied across 6 shared benchmarks: DeepSeek-V3 leads on 3, GPT-4o(2024-11-20) leads on 3, with 0 ties and an average score difference of +1.07.
DeepSeek-V3
DeepSeek-AI · 2024-12-26 · Chat model
GPT-4o(2024-11-20)
OpenAI · 2024-11-20 · Chat model
DeepSeek-V33 wins(50%)(50%)3 winsGPT-4o(2024-11-20)
Benchmark scores
Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.
General Knowledge
Even 2/2| Benchmark | DeepSeek-V3 | GPT-4o(2024-11-20) | Diff |
|---|---|---|---|
| MMLU | 88.5017 / 65 | 85.7037 / 65 | +2.80 |
| MMLU Pro | 75.9080 / 126 | 77.9072 / 126 | -2 |
Math and Reasoning
DeepSeek-V3 2/2| Benchmark | DeepSeek-V3 | GPT-4o(2024-11-20) | Diff |
|---|---|---|---|
| MATH | 87.807 / 42 | 68.5024 / 42 | +19.30 |
| FrontierMath | 1.7049 / 60 | 0.3057 / 60 | +1.40 |
Coding and Software Engineer
GPT-4o(2024-11-20) 1/1| Benchmark | DeepSeek-V3 | GPT-4o(2024-11-20) | Diff |
|---|---|---|---|
| HumanEval | 899 / 39 | 90.207 / 39 | -1.20 |
Common Sense
GPT-4o(2024-11-20) 1/1| Benchmark | DeepSeek-V3 | GPT-4o(2024-11-20) | Diff |
|---|---|---|---|
| SimpleQA | 24.9029 / 45 | 38.8019 / 45 | -13.90 |
Specs
| Field | DeepSeek-V3 | GPT-4o(2024-11-20) |
|---|---|---|
| Publisher | DeepSeek-AI | OpenAI |
| Release date | 2024-12-26 | 2024-11-20 |
| Model type | Chat model | Chat model |
| Architecture | Dense | Dense |
| Parameters | 681B | Not available |
| Context length | 128K | 128K |
| Max output | Not available | Not available |
Summary
- DeepSeek-V3leads in:Math and Reasoning (2/2)
- GPT-4o(2024-11-20)leads in:Coding and Software Engineer (1/1), Common Sense (1/1)
- Tied in:General Knowledge
On average across the 6 shared benchmarks, DeepSeek-V3 scores 1.07 higher.
Largest single-benchmark gap: MATH — DeepSeek-V3 87.80 vs GPT-4o(2024-11-20) 68.50 (+19.30).
Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.