大模型评测得分排行榜Open LLM Leaderboard中国站

为了方便大家更便捷查询,DataLearnerAI发布了DataLearnerAI-GPT:目前已经支持基于OpenLLMLeaderboard数据回答任意大模型评测结果数据地址如下:
https://chat.openai.com/g/g-8eu9KgtUm-datalearnerai-gpt
关于DataLearnerAI-GPT的详细介绍参考:https://www.datalearner.com/blog/1051699757266256
随着大量大型语言模型(LLMs)和聊天机器人每周都在发布,它们往往伴随着对性能的夸大宣称,要筛选出由开源社区所取得的真正进展以及哪个模型是当前的技术领先水平,可能会非常困难。
为此,HF推出了这个大模型开放评测追踪排行榜。📐 🤗 Open LLM Leaderboard 旨在追踪、排名和评估开源大型语言模型(LLMs)和聊天机器人在不同评测任务上的得分。
由于HuggingFace的访问稳定性和速度,我们提供了同步更新的结果。原网页请访问:https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

Open LLM Leaderboard排行榜的各个评测任务介绍

下表中关于模型类型的图标解释如下:

🟢 :预训练模型:新的、基础模型,针对给定的语料库进行训练。

🔶 :微调模型:对预训练模型在更多数据上进行微调,例如更适合聊天的微调结果。

⭕ :指令微调模型:指令微调,是特别针对任务指令数据集进行微调的模型。

🟦 :强化学习微调模型:强化微调,通常会通过增加策略稍微改变模型的损失函数。

❓:表示未知

模型名称 模型类型 参数大小(亿) 平均分 ARC分数 Hellaswag分数 MMLU分数 TruthfulQA分数 Winogrande分数 GSM8K分数 DROP分数
llama-2-70b-fb16-korean 🔶

690

56.97

67.15

86.78

69.29

56.5

82.64

29.04

7.42

airoboros-65b-gpt4-1.2 🔶

650

56.96

65.87

86.08

63.37

52.72

79.56

26.54

24.56

Dans-TotSirocco-7b 🔶

72

56.92

62.03

84.23

64.19

46.49

78.69

13.27

49.54

Dans-TotSirocco-7b 🔶

72

56.9

62.2

84.28

63.8

46.04

79.48

13.19

49.3

mistral-7b-platypus-fp16

72

56.89

63.05

84.15

64.11

45.07

78.53

17.36

45.92

scarlett-33b 🔶

323

56.68

67.75

85.48

58.98

61.05

76.8

2.81

43.88

vigogne-33b-instruct 🔶

323

56.64

63.05

85.0

58.32

52.1

78.85

11.14

47.99

Nous-Puffin-70B 🔶

687

56.58

67.41

87.37

69.77

46.77

83.9

34.27

6.6

Dans-PersonalityEngine-30b 🔶

323

56.42

63.48

84.37

58.99

46.98

80.98

15.54

44.61

Euryale-L2-70B 🔶

687

56.39

68.94

87.07

68.84

54.49

82.08

26.54

6.75

YuLan-Chat-2-13b-fp16

130

56.36

59.04

80.66

56.72

52.18

79.64

13.8

52.45

hippogriff-30b-chat 🔶

323

56.32

64.51

85.2

59.09

48.42

80.82

10.24

45.99

Llama-2-70b-hf 🟢

690

56.25

67.32

87.33

69.83

44.92

83.74

33.97

6.62

Llama-2-70B-fp16 🔶

690

56.25

67.32

87.33

69.83

44.92

83.74

33.97

6.62

chronos-70b-v2 🔶

687

56.21

68.09

86.5

68.28

53.7

81.22

28.66

7.06

trurl-2-13b-pl-instruct_unload 🔶

128

56.2

59.9

79.99

78.66

45.56

74.35

12.21

42.71

WizardLM-30B-fp16 🔶

323

56.19

62.54

83.28

59.03

52.49

77.51

22.21

36.25

WizardLM-30B-V1.0

323

56.13

62.54

83.27

59.05

52.49

77.51

21.83

36.21

OpenAssistant-SFT-7-Llama-30B-HF 🔶

323

56.12

60.58

82.17

57.93

46.94

78.61

29.8

36.81

firefly-llama-30b 🔶

323

56.06

64.25

83.64

58.23

53.2

77.43

15.85

39.83

sitebunny-13b 🔶

128

56.03

63.14

83.64

59.91

56.21

76.72

9.4

43.23

model_420_preview 🔶

687

55.99

67.06

87.26

69.85

44.57

83.35

33.21

6.6

gpt4-alpaca-lora_mlp-65B-HF 🔶

650

55.94

65.02

86.13

62.73

59.16

80.66

28.28

9.64

MelangeA-70b

690

55.92

71.25

87.3

70.56

60.61

81.53

5.69

14.53

dolphin-2.0-mistral-7b 🔶

71

55.85

59.22

80.26

56.9

61.09

75.37

18.65

39.49

Zephyrus-L1-33B 🔶

325

55.79

64.51

84.15

57.37

53.87

80.19

23.58

26.89

tulu-30B-fp16

323

55.74

59.98

83.4

56.1

45.14

80.82

19.71

45.01

ARIA-70B-V2 🔶

687

55.66

62.12

85.68

63.49

49.8

81.69

28.81

18.04

Mistralic-7B-1 🔶

71

55.44

60.84

82.29

60.8

52.38

77.03

11.07

43.71

llama-2-70b-IA3-guanaco

687

55.42

68.52

85.67

67.03

43.47

82.24

28.73

12.27

speechless-code-mistral-orca-7b-v1.0 🔶

71

55.33

59.64

82.25

61.33

48.45

77.51

8.26

49.89

openbuddy-llama2-13b-v8.1-fp16 🔶

129

55.31

55.97

79.79

54.95

51.16

74.35

30.33

40.58

airoboros-33b-2.1

323

55.3

63.65

84.97

57.37

52.17

78.22

6.6

44.12

WizardLM-70B-V1.0-GPTQ 🔶

728

55.28

63.82

83.85

63.68

54.54

78.61

18.5

23.97

30B-Epsilon

323

55.25

63.05

83.59

56.89

59.03

77.66

10.69

35.82

manticore-30b-chat-pyg-alpha 🔶

325

55.2

64.16

84.38

57.49

51.57

79.48

16.07

33.22

MLewdBoros-L2-13B 🔶

130

55.1

62.54

83.9

56.57

48.14

76.95

10.99

46.59

carl-33b 🔶

323

55.01

64.59

85.27

58.38

45.32

76.24

6.37

48.92

Llama-2-70b-chat-hf 🟦

690

54.98

64.59

85.88

63.91

52.8

80.51

26.69

10.5

ARIA-70B-French 🔶

687

54.96

64.51

85.87

63.88

52.8

80.51

26.69

10.5

alpaca-lora-65B-HF 🔶

650

54.86

64.85

85.59

63.11

45.15

81.22

28.05

16.08

mamba-gpt-7b-v2

0

54.85

61.95

83.83

61.74

46.63

78.45

17.29

34.07

Stheno-1.2-L2-13B 🔶

128

54.8

60.75

83.67

56.27

50.32

74.98

10.92

46.72

SynthIA-7B-v1.5

0

54.8

62.71

83.37

63.48

51.32

79.24

17.44

26.01

mamba-gpt-7b-v1

0

54.77

61.26

84.1

63.46

46.34

79.16

17.36

31.67

VicUnlocked-alpaca-65B-QLoRA-fp16 🔶

650

54.74

65.61

85.15

63.13

52.47

81.29

27.82

7.71

vicuna-33b-v1.3

323

54.74

62.12

83.0

59.22

56.16

77.03

13.72

31.92

guanaco-65B-HF 🔶

650

54.68

65.44

86.47

62.92

52.81

82.4

26.0

6.69

MLewd-v2.4-13B 🔶

128

54.65

61.69

83.83

55.1

53.34

74.51

9.78

44.33

Dans-AdventurousWinds-7b 🔶

72

54.63

61.01

83.47

63.69

42.65

78.22

15.69

37.65

llama-65b 🟢

653

54.62

63.48

86.09

63.93

43.43

82.56

37.23

5.63

samantha-1.1-llama-33b

323

54.59

67.83

85.55

58.79

61.19

76.48

4.02

28.29

Stable-Platypus2-13B-QLoRA-0.80-epoch

130

54.53

62.29

82.46

57.09

51.41

76.56

3.56

48.35

Mistral-7B-OpenOrca 🔶

71

54.51

64.08

83.99

62.24

53.05

77.74

19.94

20.53

airoboros-65b-gpt4-2.0 🔶

650

54.46

66.64

86.66

63.18

49.11

80.74

20.85

14.05

Stheno-1.1-L2-13B 🔶

128

54.43

60.75

83.64

56.39

50.3

75.22

7.96

46.78

WizardMath-70B-V1.0

687

54.41

68.17

86.49

68.89

52.69

82.32

3.94

18.37

llama-30b-instruct 🔶

323

54.41

62.46

86.23

59.37

52.78

80.51

12.13

27.39

WizardLM-33B-V1.0-Uncensored

323

54.41

63.65

83.84

59.36

56.8

77.66

18.65

20.89

airoboros-33b-gpt4-m2.0 🔶

323

54.4

63.14

85.19

57.28

48.07

78.45

9.7

38.94

Wizard-Vicuna-30B-Uncensored-GPTQ

356

54.39

61.09

82.4

56.46

49.9

77.66

23.28

29.96

AppleSauce-L2-13b 🔶

130

54.37

61.01

83.61

57.07

47.81

75.93

10.01

45.19

Pwen-14B-Chat-20_30 🔶

140

54.35

56.14

79.78

60.01

47.02

76.48

26.99

33.99

WizardMath-70B-V1.0

687

54.34

67.92

86.46

68.92

52.77

82.32

4.09

17.87

airoboros-33b-gpt4-m2.0 🔶

323

54.27

63.4

85.19

57.46

48.15

78.37

9.63

37.72

airoboros-65b-gpt4-2.0 🔶

650

54.27

66.81

86.66

63.41

49.17

80.27

20.55

13.0

Stable-Platypus2-13B 🔶

128

54.25

62.71

82.29

58.3

52.52

76.87

1.82

45.22

airoboros-65b-gpt4-m2.0 🔶

650

54.19

65.02

86.35

64.37

46.66

80.19

22.14

14.58

speechless-orca-platypus-coig-lite-2k-0.6e-13b 🔶

128

54.18

59.9

80.76

58.34

47.97

77.9

7.51

46.85

OrcaMini-Platypus2-13B-QLoRA-0.80-epoch

130

54.08

60.84

82.56

56.42

53.32

75.93

2.27

47.24

lemur-70b-v1 🔶

687

54.03

64.33

85.72

65.85

44.78

83.03

28.73

5.74

airoboros-33b-gpt4-2.0 🔶

323

54.01

63.91

85.67

57.95

45.54

77.98

11.07

35.94

airoboros-65b-gpt4-m2.0 🔶

650

53.94

65.1

86.34

64.32

46.63

80.11

21.61

13.47

Nebula-7B

72

53.93

59.3

83.46

57.0

45.56

76.4

14.86

40.96

airoboros-33b-gpt4-m2.0 🔶

323

53.88

64.68

84.95

57.77

47.44

77.74

10.39

34.17

CollectiveCognition-v1.1-Mistral-7B

71

53.87

62.12

84.17

62.35

57.62

75.37

15.62

19.85

openbuddy-llama2-34b-v11.1-bf16 🔶

335

53.87

50.0

71.19

55.71

53.01

70.8

34.57

41.81

openbuddy-codellama2-34b-v11.1-bf16 🔶

335

53.87

50.0

71.19

55.71

53.01

70.8

34.57

41.81

ennodata-13b-8bit-raw-15epoch 🔶

130

53.87

61.6

82.2

57.55

53.58

77.51

1.44

43.22

BerrySauce-L2-13b 🔶

130

53.86

62.29

83.78

57.1

48.3

76.09

11.75

37.75

airoboros-33b-gpt4-2.0 🔶

323

53.82

63.82

85.65

58.44

45.57

77.9

10.69

34.64

Uncensored-Frank-33B 🔶

323

53.79

62.12

83.3

57.57

54.03

76.56

16.68

26.28

airoboros-65b-gpt4-1.4 🔶

650

53.78

65.53

85.77

61.95

52.43

79.79

18.04

12.94

model_007_13b_v2 🔶

128

53.78

61.95

82.48

57.32

53.5

75.85

1.36

43.97

BELLE-Llama2-13B-chat-0.4M 🔶

128

53.77

60.67

82.31

55.94

50.85

75.53

14.4

36.7

Vicuzard-30B-Uncensored 🔶

323

53.76

62.97

83.68

58.16

52.27

77.11

15.39

26.76

ennodata-raw-pankajmathur-13b-peft 🔶

130

53.72

61.95

82.21

57.44

53.57

75.93

1.29

43.65

airoboros-65b-gpt4-1.4

650

53.68

65.78

85.83

62.27

52.45

79.64

18.04

11.76

airoboros-65b-gpt4-1.4-peft 🔶

650

53.68

65.78

85.83

62.27

52.45

79.64

18.04

11.76

Mistral-7B-v0.1-Open-Platypus

71

53.64

62.37

85.08

63.79

47.33

77.66

17.29

21.93

robin-65b-v2-fp16 🔶

650

53.61

61.95

84.6

62.51

52.31

80.51

26.99

6.42

WizardMath-70B-V1.0

687

53.6

67.49

86.03

68.44

52.23

81.77

2.88

16.36

Llama2-Chinese-13b-Chat 🔶

128

53.57

55.97

82.05

54.74

48.9

76.16

12.59

44.6

llama-33B-instructed

323

53.56

64.59

86.17

60.5

44.12

79.32

14.4

25.81

Synatra-V0.1-7B-Instruct

71

53.54

55.29

76.63

55.29

55.76

72.77

19.41

39.63

Synatra-V0.1-7B 🔶

71

53.54

55.29

76.63

55.29

55.76

72.77

19.41

39.63

jackalope-7b 🔶

71

53.53

63.4

83.29

63.5

50.06

78.06

28.66

7.79

openbuddy-llama2-13b-v11.1-bf16 🔶

129

53.51

51.79

76.23

56.13

49.7

73.48

24.34

42.89

13B-Thorns-l2

128

53.5

62.88

83.57

56.95

49.52

74.51

0.91

46.13

Stheno-L2-13B 🔶

128

53.48

61.01

83.95

56.33

50.18

75.14

11.98

35.76