大模型评测得分排行榜Open LLM Leaderboard中国站

为了方便大家更便捷查询,DataLearnerAI发布了DataLearnerAI-GPT:目前已经支持基于OpenLLMLeaderboard数据回答任意大模型评测结果数据地址如下:
https://chat.openai.com/g/g-8eu9KgtUm-datalearnerai-gpt
关于DataLearnerAI-GPT的详细介绍参考:https://www.datalearner.com/blog/1051699757266256
随着大量大型语言模型(LLMs)和聊天机器人每周都在发布,它们往往伴随着对性能的夸大宣称,要筛选出由开源社区所取得的真正进展以及哪个模型是当前的技术领先水平,可能会非常困难。
为此,HF推出了这个大模型开放评测追踪排行榜。📐 🤗 Open LLM Leaderboard 旨在追踪、排名和评估开源大型语言模型(LLMs)和聊天机器人在不同评测任务上的得分。
由于HuggingFace的访问稳定性和速度,我们提供了同步更新的结果。原网页请访问:https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

Open LLM Leaderboard排行榜的各个评测任务介绍

下表中关于模型类型的图标解释如下:

🟢 :预训练模型:新的、基础模型,针对给定的语料库进行训练。

🔶 :微调模型:对预训练模型在更多数据上进行微调,例如更适合聊天的微调结果。

⭕ :指令微调模型:指令微调,是特别针对任务指令数据集进行微调的模型。

🟦 :强化学习微调模型:强化微调,通常会通过增加策略稍微改变模型的损失函数。

❓:表示未知

模型名称 模型类型 参数大小(亿) 平均分 ARC分数 Hellaswag分数 MMLU分数 TruthfulQA分数 Winogrande分数 GSM8K分数 DROP分数
Yi-34B 🟢

340

68.68

64.59

85.69

76.35

56.23

83.03

50.64

64.2

GodziLLa2-70B

690

67.01

71.42

87.53

69.88

61.54

83.19

43.21

52.31

StellarBright 🔶

692

66.98

72.95

87.82

71.17

64.46

83.27

39.5

49.66

Platypus2-70B-instruct 🔶

687

66.89

71.84

87.94

70.48

62.26

82.72

40.56

52.41

SOLAR-0-70b-16bit

687

66.88

71.08

87.89

70.58

62.25

83.58

45.26

47.49

Euryale-1.3-L2-70B 🔶

687

66.58

70.82

87.92

70.39

59.85

82.79

34.19

60.1

model_101 🔶

687

66.55

68.69

86.42

69.92

58.85

82.08

44.81

55.1

openbuddy-llama2-70b-v10.1-bf16 🔶

688

66.47

61.86

83.13

67.41

56.18

80.11

60.27

56.3

genz-70b

687

66.34

71.42

87.99

70.78

62.66

83.5

33.74

54.28

Llama-2-70b-instruct 🔶

687

66.1

70.9

87.48

69.8

60.97

82.87

32.22

58.42

Samantha-1.11-70b

687

65.9

70.05

87.55

67.82

65.02

83.27

29.95

57.68

FashionGPT-70B-V1.1 🔶

687

65.88

71.76

88.2

70.99

65.26

82.64

41.47

40.81

MelangeB-70b

690

65.8

71.67

87.5

70.03

59.36

83.5

30.63

57.92

Samantha-1.1-70b

687

65.74

68.77

87.46

68.6

64.85

83.27

31.61

55.59

Marcoroni-70B-v1 🔶

687

65.52

73.55

87.62

70.67

64.41

83.43

33.28

45.65

Platypus_QLoRA_LLaMA_70b 🔶

687

65.41

72.1

87.46

71.02

61.18

82.87

30.78

52.45

LLaMA-2-Wizard-70B-QLoRA

700

65.4

67.58

87.52

69.11

61.79

82.32

30.48

59.03

tigerbot-70b-chat 🔶

690

65.32

76.79

87.76

66.35

55.09

77.58

45.64

47.99

tigerbot-70b-chat 🔶

690

65.24

76.79

87.83

66.08

55.1

77.9

44.96

48.02

model_009 🔶

687

65.03

71.59

87.7

69.43

60.72

82.32

39.42

44.01

airoboros-l2-70b-2.2.1

687

65.01

69.71

87.95

69.79

59.49

82.95

44.88

40.27

StableBeluga2 🔶

687

64.97

71.08

86.37

68.79

59.44

82.95

35.86

50.28

orca_mini_v3_70b 🔶

687

64.9

71.25

87.85

70.18

61.27

82.72

40.86

40.17

model_51 🔶

687

64.88

68.43

86.71

69.31

57.18

81.77

32.37

58.43

LLaMA-2-Jannie-70B-QLoRA 🔶

700

64.76

68.94

86.9

69.37

53.67

82.95

31.77

59.75

ORCA_LLaMA_70B_QLoRA 🔶

687

64.67

72.27

87.74

70.23

63.37

83.66

28.35

47.04

Synthia-70B-v1.2b 🔶

687

64.63

68.77

87.57

68.81

57.69

83.9

35.25

50.41

openbuddy-falcon-180b-v13-preview0 🔶

1,786

64.3

65.1

86.19

64.6

54.97

82.64

41.62

54.98

Camel-Platypus2-70B 🔶

687

64.23

71.08

87.6

70.04

58.09

83.82

22.9

56.1

Platypus2-70B 🔶

687

64.16

70.65

87.15

70.08

52.37

84.37

33.06

51.41

llama2-70B-qlora-gpt4

687

64.13

70.31

86.39

69.29

54.02

82.87

28.89

57.15

Camel-Platypus2-70B

687

64.05

70.14

87.71

69.83

57.77

82.95

23.96

55.97

medllama-2-70b-qlora-1.1 🔶

700

63.58

69.03

87.17

71.04

52.41

84.21

32.07

49.1

Synthia-70B-v1.2 🔶

687

63.41

70.48

86.98

70.13

58.64

83.27

31.92

42.42

model_007 🔶

687

63.2

71.08

87.65

69.04

63.12

83.35

37.15

31.05

llama-65b-instruct 🔶

650

63.1

68.86

86.43

64.77

59.7

81.06

26.23

54.69

SharpBalance 🔶

692

63.01

69.28

87.59

69.51

59.05

84.06

34.65

36.93

Synthia-70B-v1.1 🔶

687

62.84

70.05

87.12

70.34

57.84

83.66

31.84

39.02

Uni-TianYan 🔶

687

62.78

72.1

87.4

69.91

65.81

82.32

22.14

39.79

openbuddy-llama-65b-v8-bf16 🔶

651

62.62

62.8

83.6

62.01

55.09

79.95

43.37

51.5

tigerbot-70b-base 🟢

690

62.1

62.46

83.61

65.49

52.76

80.19

37.76

52.45

model_007_v2 🔶

687

62.02

71.42

87.31

68.58

62.65

84.14

28.66

31.38

Airoboros-L2-70B-2.1-GPTQ 🔶

728

61.76

70.39

86.54

68.89

55.55

81.61

15.24

54.1

higgs-llama-vicuna-ep25-70b 🔶

687

61.54

62.29

86.07

64.25

53.75

80.66

34.57

49.21

MelangeC-70b

690

61.22

71.67

87.6

70.37

58.13

83.98

0.0

56.81

Instruct_Llama70B_Dolly15k 🔶

687

60.97

68.34

87.21

69.52

46.46

84.29

42.68

28.26

airoboros-l2-70b-gpt4-2.0 🔶

687

60.78

68.52

87.89

70.41

49.79

83.5

24.72

40.63

chronos007-70b

690

60.72

70.14

87.52

69.33

57.65

82.24

42.61

15.52

openbuddy-falcon-180b-v12-preview0 🔶

1,786

60.54

62.29

83.8

55.92

53.05

82.08

41.24

45.41

ARIA-70B-V3 🔶

690

60.53

63.91

86.21

64.75

51.32

82.08

28.13

47.29

Synthia-70B 🔶

687

60.29

69.45

87.11

68.91

59.79

83.66

31.39

21.75

tora-70b-v1.0 🔶

687

60.12

67.75

85.83

69.22

51.79

81.93

23.81

40.52

airoboros-l2-70b-gpt4-2.0 🔶

687

60.05

68.6

87.53

69.37

48.52

83.9

17.66

44.74

llama-2-70B-LoRA-assemble-v2 🔶

687

59.93

71.84

86.89

69.37

64.79

81.22

14.25

31.14

test_42_70b 🔶

687

59.77

68.26

87.65

70.0

48.76

83.66

45.94

14.09

MetaMath-70B-V1.0 🔶

687

59.35

68.0

86.85

69.31

50.98

82.32

44.66

13.37

alpaca-lora-65b-en-pt-es-ca 🔶

650

59.31

65.02

84.88

62.19

46.06

80.51

26.69

49.84

falcon-180B 🟢

1,795

59.1

69.45

88.86

70.5

45.47

86.9

45.94

6.57

airoboros-l2-70b-gpt4-m2.0 🔶

687

59.08

70.05

87.83

70.67

49.79

83.58

25.4

26.2

lzlv_70b_fp16_hf 🔶

690

59.06

70.14

87.54

70.23

60.49

83.43

30.93

10.68

UltraLM-65b

650

58.99

67.06

84.98

63.48

53.51

81.14

32.75

30.0

openbuddy-llama-30b-v7.1-bf16 🔶

324

58.89

62.37

82.29

58.18

52.6

77.51

31.61

47.63

GPlatty-30B 🔶

323

58.87

65.78

84.79

63.49

52.45

80.98

13.87

50.73

Xwin-LM-70B-V0.1 🔶

687

58.87

70.22

87.25

69.77

59.86

82.87

27.22

14.91

FashionGPT-70B-V1 🔶

687

58.85

71.08

87.32

70.7

63.92

83.66

28.13

7.13

FashionGPT-70B-V1.2 🔶

687

58.85

73.04

88.15

70.11

65.15

82.56

24.03

8.9

openbuddy-llama-30b-v7.1-bf16 🔶

324

58.83

62.46

82.3

58.15

52.57

77.82

30.93

47.55

airoboros-65b-gpt4-1.3 🔶

650

58.57

66.13

85.99

63.89

51.32

79.95

13.65

49.07

llama-30b-instruct-2048 🔶

323

58.56

64.93

84.94

61.9

56.3

79.56

17.82

44.46

llama2_70b_chat_uncensored 🔶

687

58.48

68.43

86.77

68.76

52.5

82.56

30.25

20.09

mythospice-limarp-70b 🔶

687

58.45

69.2

87.46

70.14

55.86

82.72

32.22

11.59

model_420 🔶

687

58.41

70.14

87.73

70.35

54.0

83.74

28.58

14.35

llama-2-70b-Guanaco-QLoRA-fp16 🔶

687

58.32

68.26

88.32

70.23

55.69

83.98

29.8

11.98

qCammel-70-x 🔶

687

58.32

68.34

87.87

70.18

57.47

84.29

29.72

10.34

qCammel-70 🔶

687

58.32

68.34

87.87

70.18

57.47

84.29

29.72

10.34

qCammel-70x 🔶

687

58.32

68.34

87.87

70.18

57.47

84.29

29.72

10.34

qCammel-70v1 🔶

687

58.32

68.34

87.87

70.18

57.47

84.29

29.72

10.34

qCammel70 🔶

687

58.32

68.34

87.87

70.18

57.47

84.29

29.72

10.34

Lima_Unchained_70b

687

58.2

68.26

87.65

70.0

48.76

83.66

34.72

14.37

model_42_70b 🔶

687

58.2

68.26

87.65

70.0

48.76

83.66

34.72

14.37

Alpaca-elina-65b

650

57.98

65.27

85.75

63.42

47.32

81.37

29.04

33.69

GPT4-x-AlpacaDente-30b 🔶

323

57.98

62.12

82.78

56.19

52.68

78.69

30.1

43.28

internlm-20b 🟢

200

57.98

60.49

82.13

61.85

52.61

76.72

23.5

48.53

SuperPlatty-30B 🔶

323

57.89

65.78

83.95

62.57

53.52

80.35

9.63

49.44

GPT4-X-Alpasta-30b 🔶

323

57.85

63.05

83.56

57.71

51.52

78.22

30.48

40.38

llama-2-70b-fb16-orca-chat-10k 🔶

690

57.73

68.09

87.07

69.21

61.56

84.14

26.91

7.11

mythospice-70b 🔶

687

57.71

69.28

87.53

70.1

56.76

83.27

30.1

6.94

airoboros-l2-70b-gpt4-1.4.1 🔶

687

57.69

70.39

87.82

70.31

55.2

83.58

22.52

14.03

llama2-70b-oasst-sft-v10 🔶

687

57.58

67.06

86.38

67.7

56.45

82.0

27.22

16.28

llama-2-70b-dolphin-peft 🔶

700

57.34

69.62

86.82

69.18

57.43

83.9

27.37

7.03

oasst-sft-6-llama-33b-xor-MERGED-16bit

323

57.21

61.52

83.5

57.43

50.7

79.08

30.48

37.78

WizardLM-70B-V1.0

687

57.17

65.44

84.41

64.05

54.81

80.82

17.97

32.71

Platypus-30B 🔶

323

57.12

64.59

84.24

64.19

45.35

81.37

14.4

45.65

Platypus-30B

323

57.12

64.59

84.26

64.23

45.35

81.37

14.4

45.65

falcon-180B 🟢

1,795

57.11

69.2

88.89

69.59

45.16

86.74

33.21

7.02

SynthIA-7B-v1.3 🔶

71

57.11

62.12

83.45

62.65

51.37

78.85

17.59

43.76

Llama-2-70b-oasst-1-200 🔶

687

57.11

67.66

87.24

69.95

51.28

84.14

32.75

6.73

lemur-70b-chat-v1

687

57.1

66.98

85.73

65.99

56.58

81.69

35.33

7.4

fiction.live-Kimiko-V2-70B-fp16 🔶

687

57.08

67.66

87.65

69.82

49.28

83.9

34.57

6.69

GPT4-x-AlpacaDente2-30b 🔶

323

57.05

60.58

81.81

56.63

48.38

78.14

26.76

47.06