Mistral-7B-v0.2

Name: Mistral-7B-v0.2
Author: MistralAI

基础大模型

Mistral-7B-v0.2

Release date: 2024-03-24更新于: 2024-03-24 12:42:05594

Live demoGitHubHugging FaceCompare

Parameters

73.0亿

Context length

32K

Chinese support

Not supported

Reasoning ability

Mistral-7B-v0.2 is an AI model published by MistralAI, released on 2024-03-24, for 基础大模型, with 73.0B parameters, and 32K tokens context length, requiring about 15GB storage, under the Apache 2.0 license.

Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology

Mistral-7B-v0.2

Model basics

Reasoning traces

Not supported

Thinking modes

Thinking modes not supported

Context length

32K tokens

Max output length

No data

Model type

基础大模型

Release date

2024-03-24

Model file size

15GB

MoE architecture

Total params / Active params

73.0B / N/A

Knowledge cutoff

No data

Mistral-7B-v0.2

Open source & experience

Code license

Apache 2.0

Weights license

Apache 2.0- 免费商用授权

GitHub repo

https://github.com/mistralai-sf24/hackathon

Hugging Face

Hugging Face link unavailable

Live demo

No live demo

Mistral-7B-v0.2

Official resources

Paper

No paper available

DataLearnerAI blog

No blog post yet

Mistral-7B-v0.2

API details

API speed

No data

No public API pricing yet.

Mistral-7B-v0.2

Benchmark Results

No benchmark data to show.

Mistral-7B-v0.2

Publisher

MistralAI

View publisher details

Mistral-7B-v0.2

Model Overview

Mistral-7B-v0.2是MistralAI开源的73亿参数的大语言模型Mistral 7B v0.1的升级版本。应该是从头开始训练的模型，官方没有公布任何信息，而是在Mistral Hackthon上宣布的。基于Mistral-7B-v0.2进行指令微调的模型 Mistral-7B-Instruct-v0.2在2023年11月11日公布，而这个基座模型则是在2023年3月24日开源：

目前可以直接下载： https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar

Mistral-7B-v0.2的改进

根据官方的信息，相比较Mistral-7B-v0.1版本，Mistral-7B-v0.2的提升主要包括：

上下文长度由4K扩展到了32K
Rope Theta的参数变为了1e6
去掉了滑动窗口

第一个变动很容易理解，这意味着Mistral-7B-Instruct-v0.2在长的上下文场景下表现会更好。

而第二个参数可能很多人不知道。Rope Theta是一个在大语言模型训练过程中使用的参数，用于优化大型语言模型训练的算法,它可以帮助解决梯度爆炸和梯度消失的问题。Rope Theta限制梯度值在一个合理范围内，避免出现极大或极小的梯度值。从这个角度看，Mistral-7B-Instruct-v0.2调高了Rope Theta，模型能力更强，但是训练过程就不那么稳定了。

第三个是取消了滑动窗口，这意味着此前Mistral-7B-v0.1在训练过程中使用了滑动窗口，即将输入序列切分成小的chunks训练。取消的话一般会让模型对长上下文有更好的表现，但是训练过程一般会变慢。

同时，MistralAI还基于这个基座模型做了指令优化，开源了Mistral-7B-Instruct-v0.2： https://www.datalearner.com/ai-models/pretrained-models/Mistral-7B-Instruct-v0_2

如何基于Mistral-7B-v0.2微调

Mistral-7B-Instruct-v0.2是一个预训练模型，没有做过聊天或者指令微调。因此适合针对特定任务继续训练。而官方还开源了微调代码： https://github.com/mistralai-sf24/hackathon

微调Mistral-7B-Instruct-v0.2支持2种方式，一个是继续做预训练，另一个是做指令微调。两类数据格式分别如下：

继续对Mistral-7B-Instruct-v0.2进行预训练的数据集格式如下：

{"text": "Text contained in document n°1"}{"text": "Text contained in document n°2"}

即只需要将每一个处理好后的文档放到key为text的json中即可。

对Mistral-7B-Instruct-v0.2进行指令微调的数据集格式：

{"interactions": [{"is_user": true, "text": "User interaction n°1 contained in document n°1"}, {"is_user": false, "text": "Bot interaction n°1 contained in document n°1"}, {"is_user": true, "text": "User interaction n°2 contained in document n°1"}, {"is_user": false, "text": "Bot interaction n°2 contained in document n°1"}]}{"interactions": [{"is_user": true, "text": "User interaction n°1 contained in document n°2"}, {"is_user": false, "text": "Bot interaction n°1 contained in document n°2"}, {"is_user": true, "text": "User interaction n°2 contained in document n°2"}, {"is_user": false, "text": "Bot interaction n°2 contained in document n°2"}, {"is_user": true, "text": "User interaction n°3 contained in document n°2"}, {"is_user": false, "text": "Bot interaction n°3 contained in document n°2"}]}

注意，官方说，支持对Mistral-7B-Instruct-v0.2进行LoRA微调，代码如下：

torchrun --nproc-per-node 1 --master_port $RANDOM -m train reference/7B_lora.yaml

具体的方式参考官方开源的GitHub库： https://github.com/mistralai-sf24/hackathon

DataLearner 官方微信

欢迎关注 DataLearner 官方微信，获得最新 AI 技术推送

{"interactions": [{"is_user": true, "text": "User interaction n°1 contained in document n°1"}, {"is_user": false, "text": "Bot interaction n°1 contained in document n°1"}, {"is_user": true, "text": "User interaction n°2 contained in document n°1"}, {"is_user": false, "text": "Bot interaction n°2 contained in document n°1"}]}{"interactions": [{"is_user": true, "text": "User interaction n°1 contained in document n°2"}, {"is_user": false, "text": "Bot interaction n°1 contained in document n°2"}, {"is_user": true, "text": "User interaction n°2 contained in document n°2"}, {"is_user": false, "text": "Bot interaction n°2 contained in document n°2"}, {"is_user": true, "text": "User interaction n°3 contained in document n°2"}, {"is_user": false, "text": "Bot interaction n°3 contained in document n°2"}]}