DeepSeek-V2-236B-Chat

Name: DeepSeek-V2-MoE-236B-Chat
Author: DeepSeek-AI

聊天大模型

DeepSeek-V2-MoE-236B-Chat

Release date: 2024-05-06更新于: 2024-06-07 09:09:07590

Live demoGitHub Hugging Face Compare

Parameters

236B

Context length

128K

Chinese support

Supported

Reasoning ability

DeepSeek-V2-MoE-236B-Chat is an AI model published by DeepSeek-AI, released on 2024-05-06, for 聊天大模型, with 2360.0B parameters, and 128K tokens context length, requiring about 472GB storage, under the DEEPSEEK LICENSE AGREEMENT license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

DeepSeek-V2-236B-Chat

Model basics

Reasoning traces

Not supported

Thinking modes

Thinking modes not supported

Context length

128K tokens

Max output length

No data

Model type

聊天大模型

Release date

2024-05-06

Model file size

472GB

MoE architecture

Total params / Active params

236B / N/A

Knowledge cutoff

No data

DeepSeek-V2-236B-Chat

Open source & experience

Code license

MIT License

Weights license

DEEPSEEK LICENSE AGREEMENT- 免费商用授权

GitHub repo

https://github.com/deepseek-ai/DeepSeek-V2

Hugging Face

https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat

Live demo

No live demo

DeepSeek-V2-236B-Chat

Official resources

Paper

No paper available

DataLearnerAI blog

No blog post yet

DeepSeek-V2-236B-Chat

API details

API speed

No data

No public API pricing yet.

DeepSeek-V2-236B-Chat

Benchmark Results

No benchmark data to show.

DeepSeek-V2-236B-Chat

Publisher

DeepSeek-AI

View publisher details

DeepSeek-V2-MoE-236B-Chat

Model Overview

幻方量化旗下大模型企业深度求索开源的全球最大规模的大语言模型，参数数量2360亿，是一个基于混合专家架构的模型，每次推理激活其中的210亿参数。

DeepSeek-V2-236B-Chat是在8.1万亿tokens数据集上训练得到，并且做过有监督微调和强化学习对齐的版本。

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送