最新OpenLLM Leaderboard混合专家或者通过合并得到的大模型}排名

大模型评测得分排行榜Open LLM Leaderboard中国站

为了方便大家更便捷查询，DataLearnerAI发布了DataLearnerAI-GPT：目前已经支持基于OpenLLMLeaderboard数据回答任意大模型评测结果数据地址如下：

https://chat.openai.com/g/g-8eu9KgtUm-datalearnerai-gpt

关于DataLearnerAI-GPT的详细介绍参考：https://www.datalearner.com/blog/1051699757266256

随着大量大型语言模型（LLMs）和聊天机器人每周都在发布，它们往往伴随着对性能的夸大宣称，要筛选出由开源社区所取得的真正进展以及哪个模型是当前的技术领先水平，可能会非常困难。

为此，HF推出了这个大模型开放评测追踪排行榜。📐 🤗 Open LLM Leaderboard 旨在追踪、排名和评估开源大型语言模型（LLMs）和聊天机器人在不同评测任务上的得分。

由于HuggingFace的访问稳定性和速度，我们提供了同步更新的结果。原网页请访问：https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

Open LLM Leaderboard排行榜的各个评测任务介绍

AI2 Reasoning Challenge (25-shot)

一套小学科学问题。
HellaSwag (10-shot)

对于人类而言简单（大约95%）的常识推理测试，但对于最新技术模型而言具有挑战性。
MMLU (5-shot)

测试文本模型的多任务准确性，涵盖57项任务，包括小学数学、美国历史、计算机科学、法律等。
TruthfulQA (0-shot)

测试模型复制网络上常见虚假信息的倾向。注意：工具中的 TruthfulQA 实际上至少是6次尝试的任务。
Winogrande (5-shot)

大规模的、具有对抗性的、困难的 Winograd 基准测试，用于常识推理。
GSM8k (5-shot)

多样化的小学数学文字问题，用于测试模型解决多步骤数学推理问题的能力。

下表中关于模型类型的图标解释如下：

🟢 : 预训练模型：这类模型是新的基础模型，它们是基于特定数据集进行预训练的。

🔶 ：领域特定微调模型：这些预训练模型经过了针对特定领域数据集的进一步微调，以获得更好的性能。

💬 ：聊天模型：包括使用任务指令数据集的IFT（指令式任务训练）、RLHF（强化学习从人类反馈）或DPO（通过增加策略稍微改变模型的损失）等方法进行的聊天式微调模型。

🤝 ：基础合并和Moerges模型：这类模型通过合并或MoErges（模型融合）技术集成了多个模型，但不需要额外的微调。如果您发现没有图标的模型，请随时提交问题，以补充模型信息。

❓：表示未知

你可以按照如下类型筛选不同类型的模型来排序：

全部模型

Pretrained Models

Fine Tuned Models

Chat Models

Merged or MoE Models

模型名称	模型类型	参数大小（亿）	平均分	ARC分数	Hellaswag分数	MMLU分数	TruthfulQA分数	Winogrande分数	GSM8K分数	模型架构
QuartetAnemoi-70B-t0.0001 📑	🤝	689.8	76.86	73.38	88.9	75.42	69.53	85.32	68.61	LlamaForCausalLM
OmniBeagleSquaredMBX-v3-7B-v2 📑	🤝	72.4	75.98	74.06	88.93	64.53	72.93	85.56	69.9	MistralForCausalLM
NeuralOmniBeagleMBX-v3-7B 📑	🤝	72.4	75.93	73.38	88.91	64.99	73.1	84.21	70.96	MistralForCausalLM
LaserPipe-7B-SLERP 📑	🤝	72.4	74.22	71.08	87.89	64.86	65.38	83.35	72.78	MistralForCausalLM
LaserPipe-7B-SLERP 📑	🤝	72.4	74.08	70.82	87.88	64.77	65.34	83.27	72.4	MistralForCausalLM
Merge_Sakura_Solar 📑	🤝	107.3	74.03	70.73	88.51	66.03	72.21	82.72	63.99	LlamaForCausalLM
Laser-WestLake-2x7b 📑	🤝	128.8	74.0	72.27	88.44	64.71	69.25	85.79	63.53	MixtralForCausalLM
Lumosia-v2-MoE-4x10.7 📑	🤝	361	73.75	70.39	87.87	66.45	68.48	84.21	65.13	MixtralForCausalLM
Pearl-7B-0210-dare 📑	🤝	72.4	73.46	70.9	88.8	61.69	71.46	84.53	63.38	MistralForCausalLM
Pearl-7B-slerp 📑	🤝	72.4	72.75	68.0	87.16	64.04	62.35	81.29	73.62	MistralForCausalLM
piccolo-math-2x7b 📑	🤝	128.8	72.32	69.11	87.27	63.69	63.86	79.87	70.13	MixtralForCausalLM
supermario-slerp-v3 📑	🤝	72.4	72.22	69.28	86.71	65.11	61.77	80.51	69.98	MistralForCausalLM
BigWeave-v16-103b 📑	🤝	1032	72.02	65.87	87.61	73.22	63.81	80.43	61.18	LlamaForCausalLM
BigWeave-v15-103b 📑	🤝	1032	71.67	69.71	86.41	71.25	66.1	80.35	56.18	LlamaForCausalLM
supermario-slerp-v2 📑	🤝	72.4	71.45	69.71	86.54	64.82	63.06	80.74	63.84	MistralForCausalLM
Konstanta-Gamma-10.9B 📑	🤝	109.5	70.44	68.26	87.38	64.5	64.18	80.98	57.32	MistralForCausalLM
Mixtral-8x7B-v0.1-top3 📑	🤝	467	69.09	67.41	86.63	71.98	48.58	82.4	57.54	MixtralForCausalLM
WordWoven-13B 📑	🤝	128.8	68.25	66.13	85.81	64.06	54.45	78.93	60.12	MixtralForCausalLM
BigWeave-v6-90b 📑	🤝	878	67.47	65.36	87.21	68.04	57.96	81.69	44.58	LlamaForCausalLM
Pearl-3x7B 📑	🤝	185.2	67.23	65.53	85.54	64.27	52.17	78.69	57.16	MixtralForCausalLM
MoE-Merging 📑	🤝	241.5	66.84	65.44	84.58	61.31	57.83	77.66	54.21	MixtralForCausalLM
laser-dolphin-mixtral-4x7b-dpo 📑	🤝	241.5	66.71	64.93	85.81	63.04	63.77	77.82	44.88	MixtralForCausalLM
KoSOLAR-10.7B-v0.1 📑	🤝	108.6	66.04	62.03	84.54	65.56	45.03	83.58	55.5	Unknown
laser-polyglot-4x7b 📑	🤝	241.5	65.79	64.16	84.98	63.88	55.47	77.82	48.45	MixtralForCausalLM
Moe-4x7b-math-reason-code 📑	🤝	241.5	65.73	62.54	83.87	61.2	56.12	76.09	54.59	MixtralForCausalLM
Mixtral_7Bx2_MoE_13B 📑	🤝	128.8	65.14	64.85	83.92	62.27	57.55	77.9	44.35	Unknown
Etheria-55b-v0.1 📑	🤝	555.9	64.69	65.1	81.93	73.66	56.16	76.09	35.18	LlamaForCausalLM
Multilingual-mistral 📑	🤝	467	62.79	62.29	81.76	61.38	55.53	75.53	40.26	MixtralForCausalLM
Multirial 📑	🤝	467	62.37	63.23	79.57	61.01	54.7	75.3	40.41	MixtralForCausalLM
Llama2_init_Mistral 📑	🤝	72.4	60.98	60.07	83.3	64.09	42.15	78.37	37.91	LlamaForCausalLM
Influxient-4x13B 📑	🤝	385	60.57	61.26	83.42	57.25	54.1	74.35	33.06	MixtralForCausalLM
MoECPM-Untrained-4x2b 📑	🤝	77.9	53.51	46.76	72.58	53.21	38.41	65.51	44.58	MixtralForCausalLM
TinyLlama-3T-Cinder-v1.3 📑	🤝	11	37.23	33.96	58.14	25.41	38.13	63.93	3.79	LlamaForCausalLM
sheared-silicon10p 📑	🤝	27	35.82	36.18	51.12	25.56	44.85	57.22	0.0	LlamaForCausalLM
clown-SUV-4x70b 📑	🤝	2380.9	29.76	24.74	28.29	24.2	48.81	52.49	0.0	MixtralForCausalLM

注意：手机屏幕有限，仅展示平均分，所有内容建议电脑端访问。

模型名称：	QuartetAnemoi-70B-t0.0001 📑 🤝
参数大小：	689.8
平均分：	76.86

模型名称：	OmniBeagleSquaredMBX-v3-7B-v2 📑 🤝
参数大小：	72.4
平均分：	75.98

模型名称：	NeuralOmniBeagleMBX-v3-7B 📑 🤝
参数大小：	72.4
平均分：	75.93

模型名称：	LaserPipe-7B-SLERP 📑 🤝
参数大小：	72.4
平均分：	74.22

模型名称：	LaserPipe-7B-SLERP 📑 🤝
参数大小：	72.4
平均分：	74.08

模型名称：	Merge_Sakura_Solar 📑 🤝
参数大小：	107.3
平均分：	74.03

模型名称：	Laser-WestLake-2x7b 📑 🤝
参数大小：	128.8
平均分：	74.0

模型名称：	Lumosia-v2-MoE-4x10.7 📑 🤝
参数大小：	361
平均分：	73.75

模型名称：	Pearl-7B-0210-dare 📑 🤝
参数大小：	72.4
平均分：	73.46

模型名称：	Pearl-7B-slerp 📑 🤝
参数大小：	72.4
平均分：	72.75

模型名称：	piccolo-math-2x7b 📑 🤝
参数大小：	128.8
平均分：	72.32

模型名称：	supermario-slerp-v3 📑 🤝
参数大小：	72.4
平均分：	72.22

模型名称：	BigWeave-v16-103b 📑 🤝
参数大小：	1032
平均分：	72.02

模型名称：	BigWeave-v15-103b 📑 🤝
参数大小：	1032
平均分：	71.67

模型名称：	supermario-slerp-v2 📑 🤝
参数大小：	72.4
平均分：	71.45

模型名称：	Konstanta-Gamma-10.9B 📑 🤝
参数大小：	109.5
平均分：	70.44

模型名称：	Mixtral-8x7B-v0.1-top3 📑 🤝
参数大小：	467
平均分：	69.09

模型名称：	WordWoven-13B 📑 🤝
参数大小：	128.8
平均分：	68.25

模型名称：	BigWeave-v6-90b 📑 🤝
参数大小：	878
平均分：	67.47

模型名称：	Pearl-3x7B 📑 🤝
参数大小：	185.2
平均分：	67.23

模型名称：	MoE-Merging 📑 🤝
参数大小：	241.5
平均分：	66.84

模型名称：	laser-dolphin-mixtral-4x7b-dpo 📑 🤝
参数大小：	241.5
平均分：	66.71

模型名称：	KoSOLAR-10.7B-v0.1 📑 🤝
参数大小：	108.6
平均分：	66.04

模型名称：	laser-polyglot-4x7b 📑 🤝
参数大小：	241.5
平均分：	65.79

模型名称：	Moe-4x7b-math-reason-code 📑 🤝
参数大小：	241.5
平均分：	65.73

模型名称：	Mixtral_7Bx2_MoE_13B 📑 🤝
参数大小：	128.8
平均分：	65.14

模型名称：	Etheria-55b-v0.1 📑 🤝
参数大小：	555.9
平均分：	64.69

模型名称：	Multilingual-mistral 📑 🤝
参数大小：	467
平均分：	62.79

模型名称：	Multirial 📑 🤝
参数大小：	467
平均分：	62.37

模型名称：	Llama2_init_Mistral 📑 🤝
参数大小：	72.4
平均分：	60.98

模型名称：	Influxient-4x13B 📑 🤝
参数大小：	385
平均分：	60.57

模型名称：	MoECPM-Untrained-4x2b 📑 🤝
参数大小：	77.9
平均分：	53.51

模型名称：	TinyLlama-3T-Cinder-v1.3 📑 🤝
参数大小：	11
平均分：	37.23

模型名称：	sheared-silicon10p 📑 🤝
参数大小：	27
平均分：	35.82

模型名称：	clown-SUV-4x70b 📑 🤝
参数大小：	2380.9
平均分：	29.76