大模型评测得分排行榜Open LLM Leaderboard中国站

为了方便大家更便捷查询,DataLearnerAI发布了DataLearnerAI-GPT:目前已经支持基于OpenLLMLeaderboard数据回答任意大模型评测结果数据地址如下:
https://chat.openai.com/g/g-8eu9KgtUm-datalearnerai-gpt
关于DataLearnerAI-GPT的详细介绍参考:https://www.datalearner.com/blog/1051699757266256
随着大量大型语言模型(LLMs)和聊天机器人每周都在发布,它们往往伴随着对性能的夸大宣称,要筛选出由开源社区所取得的真正进展以及哪个模型是当前的技术领先水平,可能会非常困难。
为此,HF推出了这个大模型开放评测追踪排行榜。📐 🤗 Open LLM Leaderboard 旨在追踪、排名和评估开源大型语言模型(LLMs)和聊天机器人在不同评测任务上的得分。
由于HuggingFace的访问稳定性和速度,我们提供了同步更新的结果。原网页请访问:https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

Open LLM Leaderboard排行榜的各个评测任务介绍

下表中关于模型类型的图标解释如下:

🟢 :预训练模型:新的、基础模型,针对给定的语料库进行训练。

🔶 :微调模型:对预训练模型在更多数据上进行微调,例如更适合聊天的微调结果。

⭕ :指令微调模型:指令微调,是特别针对任务指令数据集进行微调的模型。

🟦 :强化学习微调模型:强化微调,通常会通过增加策略稍微改变模型的损失函数。

❓:表示未知

模型名称 模型类型 参数大小(亿) 平均分 ARC分数 Hellaswag分数 MMLU分数 TruthfulQA分数 Winogrande分数 GSM8K分数 DROP分数
zararp-l2-7b 🔶

66

50.49

56.31

79.19

51.36

51.26

74.51

1.74

39.06

llama-2-13b-dolphin_20w 🔶

128

50.48

59.56

82.55

55.89

42.67

77.27

12.43

23.01

Llama-2-13b-chat-hf 🟦

130

50.48

59.04

81.94

54.64

44.12

74.51

15.24

23.87

CodeUp-Llama-2-13b-chat-hf 🔶

128

50.48

59.04

81.93

54.63

44.12

74.51

15.24

23.87

Morningstar-13b-hf 🔶

128

50.48

59.04

81.93

54.63

44.12

74.51

15.24

23.87

OpenOrca-Platypus2-13B 🔶

128

50.47

62.8

83.15

59.39

53.08

76.24

9.02

9.63

airoboros-33b-gpt4-1.3

323

50.47

63.91

85.04

58.53

45.36

78.69

13.04

8.73

GPT4-x-Alpasta-13b 🔶

128

50.46

58.53

79.92

46.03

53.06

73.95

8.79

32.94

llama-2-13b-QLoRA

128

50.46

58.02

82.33

55.8

46.23

77.58

3.26

29.98

ReMM-v2.2-L2-13B 🔶

130

50.45

61.26

84.16

56.22

51.35

75.61

14.03

10.56

llama-2-13b-FINETUNE4_3.8w-r4-gate_up_down 🔶

128

50.45

55.8

81.74

55.09

39.12

76.32

12.81

32.29

airoboros-33b-gpt4-1.3 🔶

323

50.45

63.82

85.09

58.94

45.33

79.01

12.74

8.22

llama-2-13b-FINETUNE4_3.8w-r8-q_k_v_o_gate_up_down 🔶

128

50.43

55.97

81.53

54.42

40.72

75.06

9.55

35.77

Llama-2-13b-FINETUNE4_TEST3 🔶

128

50.41

59.04

81.65

56.37

39.98

75.45

11.22

29.18

ReMM-v2.1-L2-13B 🔶

130

50.41

61.43

83.92

55.95

50.3

75.93

12.74

12.62

Dans-MysteryModel-13b 🔶

130

50.39

57.0

80.35

52.06

45.0

74.82

0.0

43.49

llama-2-7B-LoRA-assemble 🔶

66

50.38

57.34

78.81

50.75

53.18

73.48

0.0

39.14

Alpacino30b 🔶

323

50.38

62.71

85.04

58.48

44.23

79.79

15.77

6.65

llama-2-13b-OpenOrca_20w 🔶

128

50.38

59.9

82.51

56.3

43.14

77.19

12.66

20.98

U-Amethyst-20B 🔶

200

50.38

62.2

83.11

55.88

53.2

74.19

5.31

18.75

2x-LoRA-Assemble-Nova-13B

130

50.34

62.63

83.24

58.64

51.88

76.95

10.24

8.8

Mistral-7B-v0.1 🟢

71

50.32

59.98

83.31

64.16

42.15

78.37

18.12

6.14

llama2-13b-FINETUNE3_TEST2 🔶

128

50.32

54.69

81.48

56.8

39.93

76.24

12.59

30.48

zararp-1.1-l2-7b 🔶

66

50.31

56.48

78.85

51.49

51.99

73.4

1.14

38.79

PuddleJumper-13b 🔶

128

50.23

58.7

81.18

58.25

56.44

72.77

3.34

20.93

openbuddy-mistral-7b-v13.1 🔶

71

50.23

52.56

75.73

56.68

50.44

71.59

8.72

35.89

OpenOrca-Platypus2-13B-GPTQ 🔶

162

50.22

62.54

82.67

58.56

51.93

76.8

9.4

9.61

Uncensored-Frank-13B 🔶

128

50.21

61.6

82.62

54.55

48.34

74.74

11.98

17.63

Llama-2-13b-FINETUNE4_TEST 🔶

128

50.2

54.78

81.52

56.03

39.14

77.03

13.19

29.73

zarafusionex-1.1-l2-7b 🔶

66

50.18

56.14

79.34

52.1

50.66

74.43

7.81

30.79

blossom-v2-llama2-7b

66

50.13

54.1

78.57

51.66

46.84

74.35

4.78

40.61

Mythical-Destroyer-L2-13B 🔶

130

50.13

58.7

82.0

57.66

56.35

74.66

8.95

12.56

alpaca-cleaned-llama-30b-bf16 🔶

323

50.12

61.77

85.06

57.52

51.49

77.35

7.73

9.91

Athena-v3 🔶

128

50.1

61.69

84.34

57.87

51.26

75.77

11.6

8.21

duplicitous-slurpbeast-13b

130

50.1

62.12

83.92

57.53

52.33

75.06

8.79

10.98

mcq-vicuna-13b-v1.5 🔶

128

50.1

56.66

81.09

53.3

43.99

73.01

8.04

34.62

llama-13b-pretrained-sft-do2 🔶

128

50.1

58.96

80.32

47.25

47.41

75.53

9.25

31.96

vicuna-33b-coder 🔶

323

50.08

60.41

83.27

57.17

51.79

76.87

12.89

8.16

VicUnlocked-30B-LoRA-HF

323

50.06

59.73

84.02

57.81

48.54

79.48

14.4

6.45

vigogne-13b-instruct 🔶

128

50.06

57.94

81.32

47.62

50.23

77.11

11.83

24.38

LIMA2-13b-hf

128

50.0

60.24

83.69

53.17

41.81

73.24

5.76

32.13

Capybara-7B 🔶

66

50.0

55.2

80.76

48.8

51.07

73.4

6.9

33.82

ANIMA-Phi-Neptune-Mistral-7B-v4 🔶

71

49.98

55.46

77.63

53.12

59.01

73.48

14.94

16.25

llama2-platypus-llama2-chat-13B-hf 🔶

128

49.95

62.97

82.75

56.86

42.93

76.32

2.81

25.02

Nous-Capybara-7B 🔶

66

49.94

55.29

80.73

48.72

51.13

73.32

6.97

33.44

Tulpar-7b-v1 🔶

66

49.94

57.0

79.69

51.33

51.83

72.45

0.68

36.58

ANIMA-Phi-Neptune-Mistral-7B 🔶

71

49.93

55.97

76.22

52.89

59.76

73.48

14.94

16.25

spicyboros-7b-2.2 🔶

66

49.92

56.57

80.09

48.47

47.22

74.51

4.85

37.74

WizardLM-13B-V1-1-SuperHOT-8K-fp16 🔶

128

49.92

58.62

81.07

48.32

54.19

76.01

0.76

30.46

llama2-13b-megacode2_min100 🔶

128

49.92

60.58

81.26

57.92

48.89

76.95

15.92

7.89

LosslessMegaCoder-llama2-13b-mini

128

49.92

60.58

81.26

57.92

48.89

76.95

15.92

7.89

mcq-vicuna-13b-v1.5 🔶

128

49.91

56.23

81.15

53.38

44.08

72.93

7.51

34.1

OpenOrcaxOpenChat-Preview2-13B-GPTQ 🔶

162

49.91

61.26

82.14

57.85

50.22

77.11

12.43

8.35

llama-13b-pretrained 🔶

128

49.9

56.31

79.32

47.03

48.42

76.95

16.07

25.22

MegaMix-S1-13B 🔶

130

49.89

62.46

83.65

57.88

44.52

75.85

18.35

6.51

Chronorctypus-Limarobormes-13b

130

49.88

59.9

82.75

58.45

51.9

74.43

3.87

17.89

Qwen-LLaMAfied-7B-Chat 🔶

70

49.88

50.94

83.47

53.52

46.09

73.16

4.78

37.22

chinese-llama-2-13b 🔶

130

49.87

55.8

79.53

53.01

38.24

75.69

3.94

42.85

llama-2-13b-FINETUNE5_4w-r4-gate_up_down 🔶

128

49.87

55.38

81.92

55.28

40.76

76.09

13.72

25.92

10k_v1_lora_qkvo_rank28_v2 🔶

0

49.86

55.38

79.21

50.5

52.75

73.24

0.61

37.37

llama-2-13b-FINETUNE4_3.8w-r16-gate_up_down-test1 🔶

128

49.86

55.8

82.27

55.63

38.15

77.43

12.66

27.06

llama-2-13b-FINETUNE3_3.3w-r4-gate_up_down 🔶

128

49.85

56.4

81.93

53.63

39.23

76.95

11.98

28.83

Medusa-13b 🔶

130

49.85

58.19

81.35

57.39

51.24

73.32

6.82

20.61

OpenOrcaxOpenChat-Preview2-13B 🔶

128

49.85

62.71

81.99

57.51

47.45

76.8

13.72

8.74

airoboros-l2-13b-2.2.1

128

49.83

60.92

83.77

56.47

49.42

76.01

11.6

10.6

llama-2-13b-FINETUNE3_3.3w-r8-gate_up_down 🔶

128

49.81

57.25

81.79

53.96

39.66

77.82

11.75

26.44

Llama-2-13b-FINETUNE4_compare8k2 🔶

128

49.81

58.28

81.39

56.87

39.86

76.01

11.9

24.36

WizardLM-13B-V1.1-GPTQ

162

49.81

58.53

80.66

49.59

54.35

74.43

8.11

22.96

duplicitous-mammal-13b

130

49.8

61.69

83.79

57.5

52.27

75.06

9.1

9.2

llama-2-13b-FINETUNE3_3.3w-r16-gate_up_down 🔶

128

49.79

58.7

81.89

56.08

38.95

77.35

12.96

22.62

zarafusionix-l2-7b 🔶

66

49.78

55.55

79.4

51.21

51.05

74.66

7.2

29.37

llama-30b

325

49.73

61.43

84.73

58.45

42.27

80.03

14.86

6.33

genz-13b-v2 🔶

128

49.72

55.97

79.98

54.3

48.09

74.59

12.28

22.84

Platypus2xOpenOrca-13B-IA3-v3

128

49.72

62.54

82.1

58.67

46.96

77.82

12.36

7.6

OpenOrcaPlatypus2-Platypus2-13B-QLora-0.80-epoch

130

49.71

59.81

82.69

56.96

52.92

74.43

2.35

18.83

llama-30b 🟢

325

49.71

61.26

84.73

58.47

42.27

80.03

14.86

6.33

llama-30B-hf-openassitant 🔶

323

49.71

61.26

84.73

58.47

42.27

80.03

14.86

6.33

PuddleJumper-13b-V2

128

49.69

57.0

81.06

58.3

52.66

72.45

3.64

22.74

Platypus2xOpenOrca-13B-IA3-v4

128

49.69

61.43

81.84

59.02

48.64

77.19

10.84

8.88

Orca-Nova-13B

130

49.69

62.37

82.47

57.44

45.97

77.58

14.48

7.52

dulia-13b-8k-alpha 🔶

130

49.67

60.67

82.0

56.87

42.59

77.19

10.69

17.72

llama2-13b-orca-8k-3319 🔶

128

49.67

60.75

81.91

57.06

42.64

77.19

10.99

17.14

openbuddy-mistral-7b-v13-base 🔶

71

49.67

52.9

76.12

57.54

52.82

71.35

1.21

35.72

mc_data_30k_from_platpus_orca_7b_10k_v1_lora_qkvo_rank14_v2 🟦

70

49.66

57.17

79.57

50.24

52.51

72.93

0.38

34.85

EnsembleV5-Nova-13B

130

49.65

62.71

82.55

56.79

49.86

76.24

10.77

8.64

EnsembleV5-Nova-13B

130

49.65

62.71

82.55

56.79

49.86

76.24

10.77

8.64

vigogne-2-7b-chat 🔶

66

49.65

55.63

78.71

50.98

47.21

74.43

7.73

32.83

Nova-13B

130

49.64

62.71

82.57

57.98

51.34

77.27

6.75

8.84

Platypus2xOpenOrca-13B-IA3

128

49.63

62.12

82.1

58.84

47.88

77.11

11.83

7.57

llama-13b-pretrained-sft-epoch-1 🔶

128

49.63

57.25

79.99

45.52

44.45

77.58

13.87

28.71

Medusa-1.1-L2-7B 🔶

67

49.62

56.48

78.57

51.56

47.7

75.06

1.44

36.53

vigogne-2-7b-instruct 🔶

66

49.62

56.23

79.97

47.17

49.51

75.45

3.79

35.18

llama2-13b-megacode2-oasst 🔶

128

49.61

60.67

81.93

57.38

47.85

76.16

15.54

7.74

airophin-13b-pntk-16k-fp16 🔶

130

49.59

61.18

82.86

55.19

43.2

76.16

8.04

20.5

Chronos-Beluga-v2-13bfp16 🔶

128

49.58

60.75

81.94

54.08

53.23

73.8

4.62

18.61

CalliopeDS-L2-13B 🔶

130

49.57

60.49

83.38

55.8

51.32

77.03

10.01

8.98

minotaur-13b-fixed 🔶

128

49.57

59.04

81.66

50.1

50.36

76.87

13.12

15.83

Stheno-Inverted-L2-13B 🔶

128

49.57

59.3

82.9

56.45

52.04

74.74

13.19

8.33

llama2-7b-hf-chat-lora-v2 🔶

66

49.55

55.03

78.81

51.35

44.05

74.9

10.84

31.86

Llama-2-13b-FINETUNE4_TEST2 🔶

128

49.55

58.45

81.7

56.61

40.19

76.64

13.19

20.05