DeepSeek-V3vsGPT-4o(2024-11-20)

在 7 个共同 benchmark 中，DeepSeek-V3 整体领先：DeepSeek-V3 领先 4 项，GPT-4o(2024-11-20) 领先 3 项，持平 0 项，平均分差 +5.23。

DeepSeek-AI · 2024-12-26 · 聊天大模型

OpenAI · 2024-11-20 · 聊天大模型

DeepSeek-V34 项(57%)(43%)3 项GPT-4o(2024-11-20)

评测分数

按能力类目分组，每组内按分差大小排列；共 7 项。

胶着 2/2

评测项	DeepSeek-V3	GPT-4o(2024-11-20)	分差
MMLU	88.5017 / 66	85.7037 / 66	+2.80
MMLU Pro	75.9083 / 132	77.9075 / 132	-2

DeepSeek-V3 领先 2/2

评测项	DeepSeek-V3	GPT-4o(2024-11-20)	分差
MATH	87.807 / 42	68.5024 / 42	+19.30
FrontierMath	1.7049 / 60	0.3057 / 60	+1.40

DeepSeek-V3 领先 1/1

评测项	DeepSeek-V3	GPT-4o(2024-11-20)	分差
Aider-Polyglot	48.4034 / 59Normal (No Tools)	18.2050 / 59Normal (No Tools)	+30.20

GPT-4o(2024-11-20) 领先 1/1

评测项	DeepSeek-V3	GPT-4o(2024-11-20)	分差
HumanEval	899 / 39	90.207 / 39	-1.20

GPT-4o(2024-11-20) 领先 1/1

评测项	DeepSeek-V3	GPT-4o(2024-11-20)	分差
SimpleQA	24.9031 / 47	38.8021 / 47	-13.90

价格优先使用 DataLearner 配置的 API 记录；缺失项不做推测。

部分模型公开价格不完整，缺失字段按"暂无公开价格"展示。

7 个共同 benchmark 上，DeepSeek-V3 平均高出 5.23 分。

单项差距最大的 benchmark：Aider-Polyglot — DeepSeek-V3 48.40，GPT-4o(2024-11-20) 18.20（分差 +30.20）。

本页正文由结构化模型、价格与 benchmark 数据生成，不使用实时 LLM 撰写。