大模型评测得分排行榜Open LLM Leaderboard中国站

为了方便大家更便捷查询,DataLearnerAI发布了DataLearnerAI-GPT:目前已经支持基于OpenLLMLeaderboard数据回答任意大模型评测结果数据地址如下:
https://chat.openai.com/g/g-8eu9KgtUm-datalearnerai-gpt
关于DataLearnerAI-GPT的详细介绍参考:https://www.datalearner.com/blog/1051699757266256
随着大量大型语言模型(LLMs)和聊天机器人每周都在发布,它们往往伴随着对性能的夸大宣称,要筛选出由开源社区所取得的真正进展以及哪个模型是当前的技术领先水平,可能会非常困难。
为此,HF推出了这个大模型开放评测追踪排行榜。📐 🤗 Open LLM Leaderboard 旨在追踪、排名和评估开源大型语言模型(LLMs)和聊天机器人在不同评测任务上的得分。
由于HuggingFace的访问稳定性和速度,我们提供了同步更新的结果。原网页请访问:https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

Open LLM Leaderboard排行榜的各个评测任务介绍

下表中关于模型类型的图标解释如下:

🟢 :预训练模型:新的、基础模型,针对给定的语料库进行训练。

🔶 :微调模型:对预训练模型在更多数据上进行微调,例如更适合聊天的微调结果。

⭕ :指令微调模型:指令微调,是特别针对任务指令数据集进行微调的模型。

🟦 :强化学习微调模型:强化微调,通常会通过增加策略稍微改变模型的损失函数。

❓:表示未知

模型名称 模型类型 参数大小(亿) 平均分 ARC分数 Hellaswag分数 MMLU分数 TruthfulQA分数 Winogrande分数 GSM8K分数 DROP分数
airoboros-l2-13b-2.1 🔶

128

51.78

59.47

82.47

54.83

44.65

75.06

3.56

42.44

EverythingLM-13b-V3-peft

128

51.77

58.36

81.03

54.7

52.98

72.85

5.53

36.94

UndiMix-v4-13B 🔶

130

51.77

61.95

83.88

56.9

48.96

76.16

13.72

20.82

13B-BlueMethod 🔶

128

51.76

59.64

82.07

50.34

47.74

77.11

7.81

37.62

llama-2-13b-platypus-vicuna-wizard

128

51.76

61.26

82.31

55.21

41.91

75.77

0.91

44.96

fin-llama-33b-merged

323

51.76

65.02

86.2

58.73

49.75

80.03

16.22

6.36

Airoboros-L2-13B-2.1-GPTQ 🔶

162

51.71

58.96

81.72

53.16

44.68

74.35

5.99

43.14

llama-2-13b-FINETUNE5_4w-r4-q_k_v_o 🔶

128

51.69

58.36

81.1

54.53

37.02

76.64

12.28

41.86

ypotryll-22b-epoch2-qlora

220

51.68

59.22

80.66

54.52

40.42

76.32

5.38

45.24

Nous-Hermes-13b 🔶

128

51.68

56.57

82.11

50.44

51.5

75.3

8.34

37.5

Pwen-7B-Chat-20_30 🔶

70

51.68

51.45

73.99

62.08

47.01

68.43

20.62

38.14

speechless-llama2-13b 🔶

130

51.67

62.03

81.85

58.52

55.7

76.56

13.95

13.12

Mistral-v0.1-PeanutButter-v0.0.0-7B

72

51.62

62.2

84.1

64.14

46.94

78.69

18.5

6.76

wizard-mistral-v0.1 🔶

72

51.58

61.77

83.51

63.99

47.46

78.3

19.03

7.01

WizardLM-13B-V1.1 🔶

128

51.58

60.24

81.39

50.92

54.56

75.06

8.11

30.75

airoboros-2.1-llama-2-13B-QLoRa

128

51.57

59.73

82.91

54.77

45.14

74.03

2.81

41.62

Synthia-13B-v1.2 🔶

128

51.56

61.26

82.93

56.47

47.27

76.48

10.99

25.48

GenAI-Nova-13B

130

51.53

62.29

83.27

59.47

51.79

77.35

7.73

18.82

2x-LoRA-Assemble-13B 🔶

130

51.52

63.65

83.47

59.82

55.94

76.48

9.25

12.01

MegaMix-A1-13B 🔶

130

51.51

61.6

83.49

58.26

47.48

76.16

24.11

9.46

vicuna-13b-v1.5 🔶

128

51.46

57.08

81.24

56.67

51.51

74.66

11.3

27.73

chronoboros-33B 🔶

323

51.45

63.91

85.0

59.44

49.83

80.35

15.01

6.62

airochronos-33B

325

51.43

64.42

85.21

59.79

50.59

79.32

13.72

6.93

Llama2-chat-AYT-13B 🔶

130

51.41

63.31

83.53

59.67

55.8

76.09

8.87

12.62

sheep-duck-llama-2-13b 🔶

128

51.41

63.14

84.52

59.89

55.48

76.95

9.17

10.71

Emerald-13B 🔶

128

51.39

62.29

83.69

55.7

50.94

75.93

12.81

18.38

llama-2-13B-LoRA-assemble 🔶

128

51.36

63.57

83.51

59.82

55.96

76.16

8.42

12.09

airochronos-33B 🔶

325

51.36

64.25

85.2

59.83

50.56

79.08

13.57

7.01

nash-vicuna-13b-v1dot5-ep2-w-rag-w-simple 🔶

128

51.33

59.13

80.64

56.12

51.29

74.66

10.54

26.89

openbuddy-llama2-13b-v11-bf16 🔶

129

51.32

52.99

75.38

51.36

47.94

71.03

18.88

41.63

minotaur-13b 🔶

128

51.31

56.4

79.13

49.61

49.62

76.56

12.51

35.33

MLewd-L2-Chat-13B 🔶

128

51.29

62.03

84.19

58.75

52.84

77.43

11.3

12.53

MXLewd-L2-20B 🔶

200

51.29

63.23

85.33

57.36

51.65

76.09

10.92

14.46

QuantumLM-70B-hf 🔶

690

51.29

59.47

83.02

62.25

53.39

78.77

14.78

7.32

samantha-mistral-7b

71

51.28

63.4

84.1

61.36

46.08

76.8

16.0

11.22

llama-2-13b-FINETUNE4_3.8w-r16-gate_up_down 🔶

128

51.27

55.03

81.97

56.64

38.07

77.19

12.21

37.75

llama-2-13b-FINETUNE4_3.8w-r4-q_k_v_o_gate_up_down 🔶

128

51.25

56.31

81.43

55.3

39.11

76.8

10.46

39.35

mpt-30b-instruct 🔶

300

51.24

58.45

84.31

49.15

38.05

75.14

15.31

38.28

llama-2-13b-FINETUNE3_3.3w-r8-q_k_v_o_gate_up_down 🔶

128

51.23

57.94

81.19

53.43

40.48

76.72

10.84

37.99

Mistral-v0.1-PeanutButter-v0.0.2-7B

72

51.22

61.77

84.11

64.38

45.92

78.37

17.44

6.53

Amethyst-13B 🔶

128

51.2

62.63

83.17

55.91

52.43

74.74

10.84

18.7

Amethyst-13B-Mistral 🔶

128

51.2

62.63

83.17

55.91

52.43

74.74

10.84

18.7

llama-2-13b-FINETUNE4_3.8w-r4-q_k_v_o 🔶

128

51.19

54.78

81.4

54.73

41.02

76.64

10.54

39.24

dromedary-65b-lora-HF 🔶

650

51.19

61.6

82.53

63.08

38.82

78.93

27.45

5.89

airoboros-l2-13b-3.0 🔶

130

51.18

59.81

83.71

54.86

47.79

76.16

8.95

26.99

BrainDerp2 🔶

130

51.18

60.92

81.94

58.9

57.19

75.93

9.02

14.34

Airboros2.1-Platypus2-13B-QLora-0.80-epoch

130

51.17

58.96

82.46

54.62

47.71

75.14

0.0

39.32

tulu-13B-fp16 🔶

128

51.17

53.92

80.66

53.19

43.84

75.61

14.25

36.72

falcon-40b-openassistant-peft 🔶

400

51.17

62.63

85.59

57.77

51.02

81.45

13.34

6.36

BrainDerp 🔶

130

51.16

60.75

82.1

58.81

56.9

75.85

8.26

15.48

Luban-Marcoroni-13B

130

51.16

63.65

82.92

58.7

55.55

77.03

10.01

10.25

MM-ReMM-L2-20B 🔶

200

51.14

60.84

85.18

56.45

53.33

75.77

7.73

18.66

2x-LoRA-Assemble-Platypus2-13B

130

51.13

60.58

82.56

58.25

54.77

74.9

0.91

25.96

Luban-Marcoroni-13B-v3

130

51.13

63.74

82.88

58.64

55.56

76.87

9.93

10.25

airoboros-l2-13b-gpt4-2.0 🔶

128

51.12

59.04

82.82

54.71

36.47

74.19

7.73

42.9

Luban-Marcoroni-13B-v2

130

51.11

63.48

82.89

58.72

55.56

76.95

9.93

10.25

MythoMix-L2-13b 🔶

128

51.1

61.09

83.86

55.42

52.08

75.45

9.93

19.86

llama-2-13b-FINETUNE5_4w-r4-q_k_v_o_gate_up_down 🔶

128

51.1

55.89

81.38

53.77

40.25

76.72

12.28

37.4

BrainDerp3 🔶

130

51.1

60.92

82.1

58.91

57.18

75.61

8.04

14.92

30B-Lazarus-instruct-PL-lora_unload 🔶

323

51.08

62.8

84.13

56.87

55.49

79.08

11.37

7.8

speechless-codellama-34b-v1.0 🔶

335

51.07

52.47

74.13

53.47

47.14

73.24

14.71

42.34

speechless-codellama-dolphin-orca-platypus-34b 🔶

335

51.07

52.47

74.13

53.47

47.14

73.24

14.71

42.34

samantha-mistral-instruct-7b

71

51.02

53.5

75.14

51.72

58.81

70.4

10.84

36.73

ChatAYT-Lora-Assamble-Marcoroni

130

51.0

62.46

83.05

58.72

56.12

77.35

8.87

10.46

ReMM-SLERP-L2-13B 🔶

128

50.99

60.92

83.56

55.33

51.97

75.22

9.17

20.76

Huginn-13b-v1.2 🔶

128

50.99

60.92

83.56

55.33

51.97

75.22

9.17

20.76

Luban-13B 🔶

128

50.98

63.05

82.8

58.73

55.53

76.56

9.7

10.46

vigogne-13b-chat 🔶

128

50.97

58.62

80.85

47.76

48.73

76.72

8.34

35.81

LLaMA_2_13B_SFT_v0 🔶

128

50.97

62.03

83.8

58.39

49.92

77.27

12.43

12.96

MythoMax-L2-13b 🔶

128

50.97

60.92

83.56

55.33

51.97

75.22

9.02

20.73

speechless-codellama-34b-v2.0 🔶

335

50.96

54.35

75.65

54.67

45.21

73.56

11.6

41.71

airoboros-33b-gpt4-1.4 🔶

323

50.96

64.42

85.13

59.53

50.47

77.9

11.75

7.54

CodeEngine 🔶

128

50.96

58.36

82.27

54.18

45.18

74.59

1.52

40.59

huginnv1.2

128

50.95

62.37

84.28

57.02

47.81

75.22

9.17

20.76

openbuddy-mistral-7b-v13 🔶

71

50.94

52.3

75.09

56.34

50.81

71.74

14.71

35.55

Athena-v4 🔶

130

50.92

62.54

84.19

57.33

50.87

76.48

11.98

13.09

Vicuzard-30B-Uncensored-instruct-PL-lora_unload 🔶

323

50.86

62.46

83.66

57.82

50.94

78.37

15.31

7.46

Tulpar-7b-v0 🔶

66

50.84

56.31

79.01

52.55

51.68

73.88

2.73

39.75

speechless-llama2-hermes-orca-platypus-13b 🔶

130

50.84

60.92

83.5

59.39

54.29

75.22

9.7

12.84

Mistral-7B-OpenOrca-Guanaco-accu16 🔶

71

50.84

59.73

83.08

61.29

50.81

76.56

16.0

8.38

llama-2-13b-FINETUNE3_3.3w-r16-q_k_v_o_gate_up_down 🔶

128

50.82

59.22

81.52

54.94

42.83

76.87

11.6

28.75

speechless-llama2-luban-orca-platypus-13b 🔶

130

50.81

62.54

82.76

59.23

54.66

77.11

8.19

11.19

llama-2-13b-FINETUNE3_3.3w-r4-q_k_v_o_gate_up_down 🔶

128

50.81

57.76

80.78

54.32

40.8

76.72

7.96

37.33

MLewd-ReMM-L2-Chat-20B-Inverted 🔶

200

50.81

61.69

85.32

58.0

53.77

75.61

9.1

12.16

llama-2-13b-vicuna-wizard

128

50.79

57.76

82.16

54.68

41.11

74.98

0.91

43.94

CAMEL-33B-Combined-Data 🔶

323

50.79

62.97

83.83

58.98

50.21

78.3

14.1

7.12

mistral-guanaco1k-ep2 🔶

71

50.76

60.07

82.76

61.5

54.4

78.06

11.98

6.54

Mistral-7B-guanaco1k-ep2 🔶

71

50.76

60.07

82.76

61.5

54.4

78.06

11.98

6.54

OpenOrcaxOpenChat-Preview2-13B 🔶

128

50.76

62.37

82.96

58.68

51.23

77.19

14.1

8.75

mistral-7b-platypus1k

72

50.74

61.6

82.93

63.16

46.96

78.14

16.38

5.99

llama2-13b-fintune2-4E 🔶

128

50.73

55.89

80.95

53.73

42.72

73.09

10.92

37.82

Chat-AYB-Nova-13B

130

50.73

62.97

84.28

58.58

51.28

77.58

12.36

8.03

Mistral-7B-openplatypus-1k

71

50.71

60.15

84.25

59.84

49.86

76.87

17.44

6.54

Unholy-v1-12L-13B 🔶

130

50.65

63.57

83.75

58.08

51.09

77.27

11.07

9.73

VicUnlocked-alpaca-30b 🔶

323

50.65

61.86

83.79

57.64

51.03

78.22

14.63

7.36

llama-2-16b-nastychat

162

50.62

57.42

80.59

55.99

53.45

74.66

8.11

24.13

ReMM-v2-L2-13B 🔶

130

50.57

61.95

84.0

56.14

50.81

75.85

13.19

12.08

storytime-13b

130

50.55

62.03

83.96

57.48

52.5

75.53

8.34

14.0

Enterredaas-33b 🔶

323

50.52

60.92

84.18

58.3

49.02

78.77

16.22

6.23

zararp-l2-7b 🔶

66

50.49

56.31

79.19

51.36

51.26

74.51

1.74

39.06