DeepSeekMoE 16B Base

Name: DeepSeekMoE 16B Base
Author: DeepSeek-AI

基础大模型

Release date: 2024-01-11更新于: 2024-01-11 14:40:02.873647

Live demo

Parameters

16.4B

Context length

Chinese support

Supported

Reasoning ability

DeepSeekMoE 16B Base is an AI model published by DeepSeek-AI, released on 2024-01-11, for 基础大模型, with 164.0B parameters, and 4K tokens context length, requiring about 32.77GB storage, under the DEEPSEEK LICENSE AGREEMENT license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

DeepSeekMoE 16B Base

Model basics

Reasoning traces

Not supported

Thinking modes

Thinking modes not supported

Context length

4K tokens

Max output length

No data

Model type

基础大模型

Release date

2024-01-11

Model file size

32.77GB

MoE architecture

Total params / Active params

16.4B / N/A

Knowledge cutoff

No data

DeepSeekMoE 16B Base

Open source & experience

Code license

MIT License

Weights license

DEEPSEEK LICENSE AGREEMENT- 免费商用授权

GitHub repo

https://github.com/deepseek-ai/DeepSeek-MoE

Hugging Face

https://huggingface.co/deepseek-ai/deepseek-moe-16b-base

Live demo

No live demo

DeepSeekMoE 16B Base

Official resources

Paper

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

DataLearnerAI blog

No blog post yet

DeepSeekMoE 16B Base

API details

API speed

No data

No public API pricing yet.

DeepSeekMoE 16B Base

Benchmark Results

No benchmark data to show.

DeepSeekMoE 16B Base

Publisher

DeepSeek-AI

View publisher details

DeepSeekMoE 16B Base

Model Overview

DeepSeekMoE是幻方量化旗下大模型企业DeepSeek开源的一个混合专家大模型，也是目前已知的中国第一个开源的MoE大模型。

该模型参数164亿，但是单次推理只会使用28亿参数，因此可以理解为推理成本与30亿参数规模的大模型差不多。但是其效果和70亿参数规模的大模型等同。

参数内容	LLaMA2-7B	DeepSeek 7B Base	DeepSeek MoE 16B
模型参数	70亿	69亿	164亿
每次推理参数	70亿	69亿	28亿
4K输入的FLOPs	187.9T	183.5T	74.4T
训练数据集大小	2万亿tokens	2万亿tokens	2万亿tokens
MMLU 评分（文本理解）	45.8	48.2	45
CMMLU 评分（中文文本理解）	14.6	47.2	42.5
GSM8K评分（数学推理）	15.5	17.4	18.8
HumanEval评分（代码）	14.6	26.2	26.8
MBPP评分（代码）	21.8	39.5	39.2

详细介绍参考： https://www.datalearner.com/blog/1051704952803167

该模型免费商用授权。

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送