大模型评测得分排行榜Open LLM Leaderboard中国站

为了方便大家更便捷查询,DataLearnerAI发布了DataLearnerAI-GPT:目前已经支持基于OpenLLMLeaderboard数据回答任意大模型评测结果数据地址如下:
https://chat.openai.com/g/g-8eu9KgtUm-datalearnerai-gpt
关于DataLearnerAI-GPT的详细介绍参考:https://www.datalearner.com/blog/1051699757266256
随着大量大型语言模型(LLMs)和聊天机器人每周都在发布,它们往往伴随着对性能的夸大宣称,要筛选出由开源社区所取得的真正进展以及哪个模型是当前的技术领先水平,可能会非常困难。
为此,HF推出了这个大模型开放评测追踪排行榜。📐 🤗 Open LLM Leaderboard 旨在追踪、排名和评估开源大型语言模型(LLMs)和聊天机器人在不同评测任务上的得分。
由于HuggingFace的访问稳定性和速度,我们提供了同步更新的结果。原网页请访问:https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

Open LLM Leaderboard排行榜的各个评测任务介绍

下表中关于模型类型的图标解释如下:

🟢 :预训练模型:新的、基础模型,针对给定的语料库进行训练。

🔶 :微调模型:对预训练模型在更多数据上进行微调,例如更适合聊天的微调结果。

⭕ :指令微调模型:指令微调,是特别针对任务指令数据集进行微调的模型。

🟦 :强化学习微调模型:强化微调,通常会通过增加策略稍微改变模型的损失函数。

❓:表示未知

模型名称 模型类型 参数大小(亿) 平均分 ARC分数 Hellaswag分数 MMLU分数 TruthfulQA分数 Winogrande分数 GSM8K分数 DROP分数
dolphin-2.1-mistral-7b

71

53.47

64.42

84.92

63.32

55.56

77.74

20.77

7.56

speechless-code-mistral-7b-v1.0 🔶

71

53.47

60.58

83.75

62.98

47.9

78.69

19.18

21.19

based-30b 🔶

323

53.46

63.91

85.67

58.28

35.7

80.11

0.3

50.22

Wizard-Vicuna-30B-Uncensored-fp16 🔶

323

53.44

62.12

83.45

58.24

50.81

78.45

14.25

26.74

Wizard-Vicuna-30B-Uncensored 🔶

323

53.44

62.12

83.45

58.24

50.81

78.45

14.25

26.74

SlimOpenOrca-Mistral-7B 🔶

72

53.43

62.97

83.49

62.3

57.39

77.43

21.46

9.01

Chat-AYB-Platypus2-13B

130

53.39

60.49

84.03

57.83

54.52

75.77

2.96

38.12

dolphin-2.1-mistral-7b 🔶

71

53.37

63.99

85.0

63.44

55.57

77.9

20.09

7.61

Mistral-7B-SlimOrca 🔶

71

53.34

62.54

83.86

62.77

54.23

77.43

21.38

11.2

speechless-mistral-dolphin-orca-platypus-samantha-7b

0

53.34

64.33

84.4

63.72

52.52

78.37

21.38

8.66

30B-Lazarus 🔶

323

53.33

64.93

84.27

56.47

58.65

78.37

7.73

22.9

MLewd-ReMM-L2-Chat-20B 🔶

200

53.33

62.46

85.62

59.13

55.63

77.19

10.92

22.33

speechless-hermes-coig-lite-13b 🔶

130

53.31

59.47

82.28

55.18

47.6

78.61

10.77

39.25

llama-65b 🟢

653

53.31

63.48

86.09

63.93

43.43

82.56

27.67

5.98

Mistral-7B-Instruct-v0.1

71

53.27

54.52

75.63

55.38

56.28

73.72

14.25

43.1

Pygmalion-2-13b-SuperCOT 🔶

130

53.27

63.23

83.68

54.9

53.14

77.51

6.29

34.13

OpenRP-13B 🔶

128

53.25

62.12

82.6

57.5

48.29

76.01

12.89

33.38

speechless-hermes-coig-lite-13b 🔶

130

53.22

59.56

82.26

55.3

47.56

78.53

9.86

39.5

oasst-rlhf-2-llama-30b-7k-steps-hf 🟦

323

53.18

61.35

83.8

57.89

51.18

78.77

31.46

7.78

chinese-alpaca-33b-merged

324

53.09

59.3

78.43

57.69

52.45

76.09

8.04

39.67

Mistral-11B-TestBench9 🔶

107

53.06

64.08

84.24

64.0

56.19

78.45

16.15

8.35

Yi-6B 🟢

60

53.06

55.55

76.42

63.85

41.86

73.8

12.66

47.32

llama-30b-supercot 🔶

323

53.06

64.85

85.08

56.56

53.96

80.03

11.9

19.07

Llama-chat-AY-13B 🔶

130

53.04

62.8

83.23

60.01

55.95

75.93

12.13

21.25

LlongOrca-13B-16k 🔶

128

53.02

62.46

82.75

55.54

50.11

76.4

12.28

31.59

Mistral-11B-TestBench11 🔶

107

53.01

64.42

83.93

63.82

56.68

77.74

14.94

9.57

Llama2-chat-AYB-13B 🔶

130

53.01

63.4

84.79

59.34

55.62

76.24

11.3

20.41

Dolphin2.1-OpenOrca-7B

0

53.0

63.91

84.26

62.66

53.84

78.22

19.94

8.17

chinese-alpaca-2-13b 🔶

130

53.0

58.7

79.74

55.1

50.22

75.69

10.46

41.06

Huginn-13b-FP16 🔶

128

52.97

60.58

82.53

53.71

54.46

73.72

4.32

41.44

Dolphin2.1-OpenOrca-7B

0

52.92

64.16

84.25

62.7

53.83

77.66

19.71

8.15

MythoMix-Platypus2-13B-QLoRA-0.80-epoch

130

52.91

60.32

83.72

55.74

52.18

75.53

0.91

41.98

Nous-Hermes-Platypus2-13B-QLoRA-0.80-epoch

130

52.89

59.9

83.29

56.69

51.08

75.22

1.44

42.65

Samantha-Nebula-7B

72

52.87

57.0

82.25

54.21

49.58

73.09

11.37

42.57

UltraLM-13b-v2.0 🔶

128

52.85

62.63

81.49

56.17

49.48

76.48

10.99

32.69

TekniumAiroboros-Nebula-7B

72

52.82

57.17

81.72

55.25

51.64

73.24

9.4

41.33

Libra-19B

192

52.8

60.58

82.04

55.57

48.41

76.32

0.08

46.63

llama-2-13b-FINETUNE5_4w-r8-q_k_v_o 🔶

128

52.79

57.25

81.73

55.72

41.53

77.58

14.03

41.7

ReMM-Mistral-13B 🔶

128

52.76

62.2

83.82

55.43

53.32

74.51

12.05

27.96

llama-30b-2048-instruct-PL-lora_unload 🔶

323

52.75

63.82

84.7

61.49

52.49

79.79

17.89

9.08

vigogne-2-13b-instruct 🔶

130

52.74

61.18

83.25

55.92

51.08

77.35

2.05

38.37

MLewd-Chat-v2-13B 🔶

128

52.72

61.86

83.81

57.0

54.51

75.77

10.46

25.63

llama-2-13b-Beluga-QLoRA

128

52.7

59.22

81.92

56.67

48.23

77.19

1.29

44.41

trurl-2-13b-academic

128

52.7

57.94

79.55

55.2

43.46

76.56

10.92

45.28

Athena-v1 🔶

130

52.68

60.07

82.64

55.61

46.58

74.82

4.93

44.11

speechless-orca-platypus-coig-lite-4k-0.6e-13b 🔶

128

52.65

58.79

79.93

56.77

48.29

75.93

4.25

44.59

ReMM-L2-13B 🔶

130

52.58

59.73

83.1

54.11

49.94

74.51

2.96

43.7

ReMM-L2-13B-PIPPA 🔶

128

52.58

59.73

83.12

54.1

49.94

74.51

2.96

43.69

UndiMix-v1-13b 🔶

130

52.56

59.47

82.45

55.83

49.78

75.45

10.01

34.95

Llama-2-70B-chat-GPTQ 🔶

728

52.56

62.63

84.81

62.74

50.98

78.69

18.65

9.4

llama-30b-instruct-2048-PL-lora 🔶

323

52.55

63.31

84.66

61.66

53.35

79.08

16.83

8.94

CollectiveCognition-v1-Mistral-7B

71

52.55

62.37

85.5

62.76

54.48

77.58

17.89

7.22

airoboros-33b-gpt4

323

52.54

63.74

84.87

58.54

47.06

77.03

12.66

23.9

speechless-llama2-dolphin-orca-platypus-13b 🔶

128

52.54

59.64

82.65

57.9

43.44

77.19

9.7

37.24

Slerpeno 🔶

130

52.5

61.69

84.1

56.77

48.05

76.4

12.51

28.0

zephyr-7b-alpha 🔶

71

52.4

61.01

84.04

61.39

57.9

78.61

14.03

9.82

magpie-13b 🔶

128

52.37

63.31

84.25

58.15

49.15

76.48

14.48

20.78

Nous-Hermes-Llama2-13b 🔶

128

52.35

61.52

83.29

55.11

50.38

75.45

10.08

30.61

WizardLM-30B-Uncensored 🔶

323

52.32

60.24

82.93

56.8

51.57

74.35

12.89

27.45

llama-2-13b-huangyt_Fintune_1_17w-q_k_v_o_proj 🔶

128

52.29

59.73

81.06

54.53

38.64

78.14

14.03

39.9

firefly-llama2-13b 🔶

128

52.25

59.13

81.99

55.49

51.57

74.66

11.22

31.69

orca_mini_v3_13b 🔶

128

52.23

63.14

82.35

56.52

51.81

76.48

13.12

22.23

orca_mini_v3_13b

128

52.23

63.14

82.35

56.52

51.81

76.48

13.12

22.23

MythoLogic-L2-13b 🔶

128

52.22

61.01

83.93

55.7

48.64

76.09

11.75

28.43

platypus-2-22b-relora

218

52.21

57.68

82.44

55.33

43.61

77.35

6.6

42.46

llama-2-13b-FINETUNE4_3.8w-r8-q_k_v_o 🔶

128

52.21

57.68

81.91

54.95

41.31

76.48

12.05

41.07

llama-2-13b-FINETUNE5_4w-r16-q_k_v_o 🔶

128

52.2

58.7

81.66

53.87

43.02

76.72

13.8

37.63

airoboros-33b-gpt4-1.2 🔶

323

52.2

64.42

84.93

60.35

49.18

77.51

9.78

19.21

13B-Chimera

128

52.19

57.59

81.5

49.86

52.59

77.27

10.69

35.84

GodziLLa-30B 🔶

323

52.17

61.52

82.13

54.21

55.91

76.16

0.38

34.86

samantha-1.2-mistral-7b

71

52.16

64.08

85.08

63.91

50.4

78.53

16.98

6.13

Camel-Platypus2-13B 🔶

128

52.12

60.75

83.61

56.51

49.6

75.37

0.08

38.91

tigerbot-13b-base 🟢

130

52.11

53.84

77.05

53.57

44.06

74.98

17.06

44.21

platypus2-22b-relora

218

52.11

57.51

82.36

54.94

43.62

77.11

6.29

42.9

speechless-orca-platypus-coig-lite-4k-0.5e-13b 🔶

128

52.09

58.02

80.15

57.26

48.04

75.45

5.84

39.88

llama-2-13b-FINETUNE3_3.3w-r16-q_k_v_o 🔶

128

52.08

59.3

81.2

55.58

38.13

76.8

13.5

40.06

CreativityEngine 🔶

128

52.07

59.3

82.42

53.55

52.46

74.19

9.55

32.98

llama-2-13b-FINETUNE3_3.3w-r8-q_k_v_o 🔶

128

52.06

56.06

81.89

55.04

40.12

76.56

14.25

40.49

Nous-Hermes-Llama2-13b 🔶

128

52.03

61.26

83.26

55.04

50.41

75.37

9.17

29.71

firefly-llama2-13b-v1.2 🔶

128

52.03

60.67

80.46

56.51

51.03

74.82

11.75

28.94

Stheno-1.8-L2-13B 🔶

128

52.0

63.48

84.12

58.57

52.86

76.4

13.27

15.33

Llama-2-13b-orca-v1 🔶

128

51.99

62.2

82.32

57.67

49.6

76.8

12.89

22.47

Nous-Hermes-13B-Code 🔶

128

51.98

61.18

83.21

55.13

50.56

75.14

10.39

28.28

llama-2-13b-huangyt_Fintune_1_17w 🔶

128

51.96

59.47

81.0

54.31

38.17

77.27

13.27

40.25

CalliopeDS-v2-L2-13B 🔶

130

51.95

62.8

84.14

56.14

51.06

76.01

12.59

20.92

StableBeluga-13B 🔶

130

51.94

62.03

82.27

57.71

49.61

76.87

13.8

21.26

Llama-2-13b-orca-v1 🔶

128

51.94

62.03

82.27

57.71

49.61

76.87

13.8

21.26

gpt4-alpaca-lora-30b-HF

323

51.93

64.85

85.72

58.51

52.24

80.19

15.54

6.44

llama-2-13b-huangyt_FINETUNE2_3w-q_k_v_o_proj 🔶

128

51.9

58.53

82.47

53.9

37.92

76.8

12.81

40.85

NewHope_HF_not_official 🔶

128

51.9

61.09

84.03

55.73

44.96

74.98

15.85

26.66

llama-2-13b-FINETUNE4_3.8w-r16-q_k_v_o_gate_up_down 🔶

128

51.89

57.25

81.49

55.9

39.79

75.77

12.05

40.95

llama-2-13b-FINETUNE4_3.8w-r16-q_k_v_o 🔶

128

51.89

56.23

81.98

55.87

39.76

76.72

11.52

41.18

Mistral-7B-OpenOrca-1k

71

51.88

62.97

84.66

62.2

52.96

78.61

11.98

9.74

gaodrew-llama-30b-instruct-2048-Open-Platypus-100steps 🔶

323

51.87

61.52

84.06

60.23

51.05

80.82

17.66

7.75

Emerhyst-20B 🔶

200

51.85

61.69

84.98

56.98

54.16

76.09

8.49

20.56

speechless-llama2-hermes-orca-platypus-wizardlm-13b 🔶

130

51.85

59.64

82.7

58.3

56.0

75.37

13.12

17.81

llama-2-13b-FINETUNE3_3.3w-r4-q_k_v_o 🔶

128

51.83

59.04

81.15

53.0

40.16

76.48

11.9

41.1

MegaMix-T1-13B 🔶

130

51.82

61.35

83.44

58.49

48.19

76.09

24.11

11.04

speechless-codellama-34b-v1.9 🔶

335

51.8

54.27

75.2

56.12

43.92

73.56

24.79

34.74

vigogne2-enno-13b-sft-lora-4bit

128

51.79

62.03

82.65

54.11

42.98

76.95

0.15

43.65