DeepSeek-V3vsGPT-4o(2024-11-20)

DeepSeek-V3 and GPT-4o(2024-11-20) are tied across 6 shared benchmarks: DeepSeek-V3 leads on 3, GPT-4o(2024-11-20) leads on 3, with 0 ties and an average score difference of +1.07.

DeepSeek-AI
DeepSeek-V3

DeepSeek-AI · 2024-12-26 · Chat model

OpenAI
GPT-4o(2024-11-20)

OpenAI · 2024-11-20 · Chat model

DeepSeek-V33 wins(50%)(50%)3 winsGPT-4o(2024-11-20)

Benchmark scores

Grouped by capability, sorted by largest gap within each. 6 shared benchmarks.

General Knowledge

Even 2/2
BenchmarkDeepSeek-V3GPT-4o(2024-11-20)Diff
MMLU88.5017 / 6585.7037 / 65+2.80
MMLU Pro75.9080 / 12677.9072 / 126-2

Math and Reasoning

DeepSeek-V3 2/2
BenchmarkDeepSeek-V3GPT-4o(2024-11-20)Diff
MATH87.807 / 4268.5024 / 42+19.30
FrontierMath1.7049 / 600.3057 / 60+1.40

Coding and Software Engineer

GPT-4o(2024-11-20) 1/1
BenchmarkDeepSeek-V3GPT-4o(2024-11-20)Diff
HumanEval899 / 3990.207 / 39-1.20

Common Sense

GPT-4o(2024-11-20) 1/1
BenchmarkDeepSeek-V3GPT-4o(2024-11-20)Diff
SimpleQA24.9029 / 4538.8019 / 45-13.90

Specs

FieldDeepSeek-V3GPT-4o(2024-11-20)
PublisherDeepSeek-AIOpenAI
Release date2024-12-262024-11-20
Model typeChat modelChat model
ArchitectureDenseDense
Parameters681BNot available
Context length128K128K
Max outputNot availableNot available

Summary

  • DeepSeek-V3leads in:Math and Reasoning (2/2)
  • GPT-4o(2024-11-20)leads in:Coding and Software Engineer (1/1), Common Sense (1/1)
  • Tied in:General Knowledge

On average across the 6 shared benchmarks, DeepSeek-V3 scores 1.07 higher.

Largest single-benchmark gap: MATH — DeepSeek-V3 87.80 vs GPT-4o(2024-11-20) 68.50 (+19.30).

Page generated from structured model, pricing and benchmark records. No real-time LLM is used to write the prose.