Qwen1.5-MoE-A2.7B-Chat

Name: Qwen1.5-MoE-A2.7B-Chat
Author: 阿里巴巴

基础大模型

Release date: 2024-03-28更新于: 2024-04-07 22:06:16643

Live demoGitHub Hugging Face Compare

Parameters

143.0亿

Context length

Chinese support

Supported

Reasoning ability

Qwen1.5-MoE-A2.7B-Chat is an AI model published by 阿里巴巴, released on 2024-03-28, for 基础大模型, with 143.0B parameters, and 1K tokens context length, requiring about 3.46GB storage, under the Tongyi Qianwen RESEARCH LICENSE AGREEMENT license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Qwen1.5-MoE-A2.7B-Chat

Model basics

Reasoning traces

Not supported

Thinking modes

Thinking modes not supported

Context length

1K tokens

Max output length

No data

Model type

基础大模型

Release date

2024-03-28

Model file size

3.46GB

MoE architecture

Total params / Active params

143.0B / N/A

Knowledge cutoff

No data

Qwen1.5-MoE-A2.7B-Chat

Open source & experience

Code license

Tongyi Qianwen RESEARCH LICENSE AGREEMENT

Weights license

Tongyi Qianwen RESEARCH LICENSE AGREEMENT- 免费商用授权

GitHub repo

https://github.com/QwenLM/Qwen1.5

Hugging Face

https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B-Chat

Live demo

No live demo

Qwen1.5-MoE-A2.7B-Chat

Official resources

Paper

Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters

DataLearnerAI blog

重磅！阿里巴巴开源自家首个MoE技术大模型：Qwen1.5-MoE-A2.7B，性能约等于70亿参数规模的大模型Mistral-7B

Qwen1.5-MoE-A2.7B-Chat

API details

API speed

No data

No public API pricing yet.

Qwen1.5-MoE-A2.7B-Chat

Benchmark Results

No benchmark data to show.

Qwen1.5-MoE-A2.7B-Chat

Publisher

阿里巴巴

View publisher details

Qwen1.5-MoE-A2.7B-Chat

Model Overview

Qwen1.5-MoE-A2.7B简介

最近2天，业界有3个重磅的MoE技术大模型开源，包括前天的DBRX以及今天的Jamba和阿里的Qwen1.5-MoE-A2.7B。

Qwen1.5-MoE-A2.7B是基于阿里此前开源的Qwen1.5-1.8B模型继续迭代升级的混合专家技术大模型。Qwen1.5-MoE-A2.7B模型总的参数数量是143亿，但每次推理只使用27亿参数。

阿里官方称他们使用的是特别设计的MoE架构。通常，如Mixtral方法中所见，每个transformer块内的MoE层采用八个专家，并使用前两名门控策略用于路由。这种配置虽然简单有效，但有很大的提升空间。因此，通过一系列广泛的实验，阿里对这个架构进行了几项修改：

更加细粒度专家
非从头训练的“升级再利用”的初始化
带有共享和路由专家的路由机制

以前的研究项目，如DeepSeek-MoE和DBRX，已经证明了使用细粒度专家的有效性。阿里将单个FFN分割成几个部分，每个部分作为一个独立的专家。这是一种更为细致的构建专家的方法。

所以，虽然Qwen1.5-MoE-A2.7B模型参数量不大，但是总共有64个专家，比传统的8个专家的MoE设置增加了8倍，每次推理激活其中4个专家。同时利用现有的Qwen-1.8B，将其转变为Qwen1.5-MoE-A2.7B。

一个值得注意的发现是，在初始化过程中引入随机性显著加快了收敛速度，并在整个预训练过程中取得了更好的整体性能。

Qwen1.5-MoE-A2.7B的效果

根据阿里官方提供的数据，Qwen1.5-MoE-A2.7B参数总数143亿，每次推理激活27亿，其效果约等于70亿参数规模的大模型。

从这个角度看，Qwen1.5-MoE-A2.7B显存（半精度）最低需要28GB，但是推理的时候因为只使用了27亿参数，所以推理速度会更快。也就是意味着，Qwen1.5-MoE-A2.7B模型用2倍于70亿参数模型的显存，推理速度则提升到原来的1.74倍。

下图是模型与其它模型的评测对比：

模型名称	参数数量	MMLU	GSM8K	HumanEval	Multilingual	MT-Bench
Mistral-7B	70亿	64.1	47.5	27.4	40.0	7.60
Gemma-7B	70亿	64.6	50.9	32.3	-	-
Qwen1.5-7B	70亿	61.0	62.5	36.0	45.2	7.60
DeepSeekMoE 16B	160亿，激活使用40亿	45.0	18.8	26.8	-	6.93
Qwen1.5-MoE-A2.7B	143亿，激活使用27亿	62.5	61.5	34.2	40.8	7.17

可以看到，Qwen1.5-MoE-A2.7B与70亿参数模型基本差不多。这种显存换速度的方法，看个人选择了。

另外一个值得注意的点是在Qwen1.5-MoE-A2.7B模型在NVIDIA A100-80G GPU可以达到每秒4000个tokens的生成速度！非常恐怖！（输入输出都是1K的tokens）

Qwen1.5-MoE-A2.7B开源和使用

Qwen1.5-MoE-A2.7B模型是允许免费商用的。不过由于最新的transformers代码没有合入这个模型，所以想要使用的话需要从GitHub下载源码进行编译安装后才能使用。

Qwen1.5-MoE-A2.7B模型开源地址参考：https://www.datalearner.com/ai-models/pretrained-models/Qwen1_5-MoE-A2_7B

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送

模型名称

参数数量

MMLU

GSM8K

HumanEval

Multilingual

MT-Bench

Mistral-7B

70亿

64.1

47.5

27.4

40.0

7.60

Gemma-7B

70亿

64.6

50.9

32.3

Qwen1.5-7B

70亿

61.0

62.5

36.0

45.2

7.60

DeepSeekMoE 16B

160亿，激活使用40亿

45.0

18.8

26.8

6.93

Qwen1.5-MoE-A2.7B

143亿，激活使用27亿

62.5

61.5

34.2

40.8

7.17