加载中...
加载中...
Open LLM Leaderboard是追踪大模型评测结果的排行榜,通过追踪大语言模型和ChatBot在不同评测任务上的表现来对模型进行排名和评估。
数据来源: HuggingFace
| 模型名称 | 模型类型 | 参数大小(亿) | 平均分 | ARC分数 | HellaSwag分数 | MMLU分数 | TruthfulQA分数 | Winogrande分数 | GSM8K分数 | 模型架构 |
|---|---|---|---|---|---|---|---|---|---|---|
| codegen-6B-nl | Pretrained Models | 60 | 40.0 | 42.32 | 68.59 | 25.93 | 34.47 | 66.46 | 2.2 | CodeGenForCausalLM |
| Javalion-GPTJ | Fine Tuned Models | 0 | 39.97 | 41.89 | 68.69 |
数据仅供参考,以官方来源为准。模型名称旁的链接可跳转到 DataLearner 模型详情页。
| 26.85 |
| 35.44 |
| 65.27 |
| 1.67 |
| GPTJForCausalLM |
| WizardLM-30B-GPTQ | Unkown Model Types | 355.8 | 39.9 | 28.84 | 26.08 | 24.62 | 49.14 | 76.32 | 34.42 | LlamaForCausalLM |
| h2ogpt-gm-oasst1-en-1024-open-llama-7b-preview-400bt | Fine Tuned Models | 70 | 39.89 | 41.3 | 62.44 | 27.55 | 42.0 | 64.56 | 1.52 | LlamaForCausalLM |
| Skegma-GPTJ | Fine Tuned Models | 0 | 39.87 | 43.77 | 69.22 | 25.37 | 34.67 | 64.64 | 1.52 | GPTJForCausalLM |
| Pythia-Chat-Base-7B | Fine Tuned Models | 70 | 39.81 | 40.02 | 68.67 | 27.44 | 34.63 | 64.01 | 4.09 | GPTNeoXForCausalLM |
| CodeLlama-7b-hf | Pretrained Models | 67.4 | 39.81 | 39.93 | 60.8 | 31.12 | 37.82 | 64.01 | 5.16 | LlamaForCausalLM |
| open_llama_3b_glaive_assistant_v0.1 | Fine Tuned Models | 34.3 | 39.74 | 40.7 | 67.45 | 27.74 | 35.86 | 64.72 | 1.97 | Unknown |
| open_llama_3b_glaive_code_v0.1 | Fine Tuned Models | 34.3 | 39.74 | 40.7 | 67.45 | 27.74 | 35.86 | 64.72 | 1.97 | LlamaForCausalLM |
| open_llama_3b_glaive_v0.1 | Fine Tuned Models | 34.3 | 39.74 | 40.7 | 67.45 | 27.74 | 35.86 | 64.72 | 1.97 | Unknown |
| WizardVicuna-Uncensored-3B-0719 | Fine Tuned Models | 34.3 | 39.73 | 41.38 | 66.19 | 26.53 | 39.35 | 63.77 | 1.14 | LlamaForCausalLM |
| open_llama_3b_code_instruct_0.1 | Chat Models | 34.3 | 39.72 | 41.21 | 66.96 | 27.82 | 35.01 | 65.43 | 1.9 | LlamaForCausalLM |
| pythia-12b-deduped | Pretrained Models | 120 | 39.7 | 41.38 | 70.26 | 25.63 | 33.0 | 66.46 | 1.44 | GPTNeoXForCausalLM |
| GPT-J-Pyg_PPO-6B-Dev-V8p4 | Fine Tuned Models | 60 | 39.61 | 40.19 | 66.43 | 30.39 | 34.76 | 64.01 | 1.9 | GPTJForCausalLM |
| OPT-13B-Erebus | Fine Tuned Models | 130 | 39.61 | 40.02 | 70.07 | 25.32 | 34.93 | 66.54 | 0.76 | OPTForCausalLM |
| OPT-13B-Nerybus-Mix | Fine Tuned Models | 130 | 39.61 | 39.85 | 70.6 | 24.9 | 34.02 | 67.88 | 0.38 | OPTForCausalLM |
| GPT-J-6B-Shinen | Fine Tuned Models | 60 | 39.6 | 39.85 | 67.06 | 27.72 | 36.94 | 64.09 | 1.97 | GPTJForCausalLM |
| GPT-J-Pyg_PPO-6B | Fine Tuned Models | 60 | 39.6 | 42.06 | 67.51 | 28.52 | 31.95 | 64.72 | 2.81 | GPTJForCausalLM |
| speechless-nl2sql-ds-6.7b | Fine Tuned Models | 67.4 | 39.59 | 36.35 | 52.83 | 36.8 | 40.55 | 55.96 | 15.09 | LlamaForCausalLM |
| GPT-J-6B-Janeway | Fine Tuned Models | 60 | 39.54 | 40.87 | 67.11 | 27.45 | 35.74 | 64.72 | 1.36 | GPTJForCausalLM |
| LightGPT | Unkown Model Types | 0 | 39.54 | 39.93 | 63.82 | 28.45 | 36.69 | 64.48 | 3.87 | GPTJForCausalLM |
| OPT-13B-Nerys-v2 | Fine Tuned Models | 130 | 39.53 | 39.68 | 70.53 | 25.36 | 33.5 | 67.88 | 0.23 | OPTForCausalLM |
| RedPajama-INCITE-Chat-3B-v1 | Fine Tuned Models | 30 | 39.53 | 42.83 | 67.62 | 26.23 | 34.44 | 65.51 | 0.53 | GPTNeoXForCausalLM |
| gpt-sw3-6.7b-v2 | Pretrained Models | 71.1 | 39.49 | 39.42 | 66.39 | 30.09 | 35.6 | 64.25 | 1.21 | GPT2LMHeadModel |
| WizardVicuna-3B-0719 | Fine Tuned Models | 30 | 39.48 | 40.7 | 65.45 | 25.44 | 40.71 | 63.85 | 0.76 | LlamaForCausalLM |
| dolly-v2-12b | Fine Tuned Models | 120 | 39.46 | 42.41 | 72.53 | 25.92 | 33.83 | 60.85 | 1.21 | GPTNeoXForCausalLM |
| llama2-ppo | Fine Tuned Models | 67.4 | 39.44 | 41.64 | 49.46 | 35.36 | 45.08 | 64.96 | 0.15 | Unknown |
| RedPajama-INCITE-Chat-3B-Instruction-Tuning-with-GPT-4 | Fine Tuned Models | 29.1 | 39.38 | 41.64 | 66.23 | 27.26 | 36.1 | 64.4 | 0.68 | GPTNeoXForCausalLM |
| RedPajama-INCITE-7B-Chat | Fine Tuned Models | 70 | 39.37 | 42.06 | 70.82 | 26.94 | 36.09 | 59.83 | 0.45 | GPTNeoXForCausalLM |
| RedPajama-INCITE-Chat-7B-v0.1 | Fine Tuned Models | 66.5 | 39.37 | 42.06 | 70.82 | 26.94 | 36.09 | 59.83 | 0.45 | Unknown |
| pythia-6.9b-deduped | Pretrained Models | 69 | 39.3 | 41.3 | 67.05 | 26.48 | 35.19 | 64.09 | 1.67 | GPTNeoXForCausalLM |
| LLmRA-3B-v0.1 | Fine Tuned Models | 30 | 39.25 | 39.42 | 59.79 | 25.16 | 50.62 | 59.43 | 1.06 | LlamaForCausalLM |
| dolly-v2-7b | Fine Tuned Models | 70 | 39.24 | 44.54 | 69.64 | 25.18 | 34.88 | 60.06 | 1.14 | GPTNeoXForCausalLM |
| FLAMA-0.5-3B | Fine Tuned Models | 30 | 39.23 | 37.97 | 67.65 | 25.73 | 41.11 | 62.12 | 0.83 | LlamaForCausalLM |
| RedPajama-INCITE-Chat-Instruct-3B-V1 | Fine Tuned Models | 27.8 | 39.23 | 42.58 | 67.48 | 25.99 | 33.62 | 64.8 | 0.91 | GPTNeoXForCausalLM |
| RedTulu-Uncensored-3B-0719 | Fine Tuned Models | 30 | 39.19 | 40.02 | 62.55 | 30.37 | 37.59 | 62.35 | 2.27 | GPTNeoXForCausalLM |
| bloom-7b1 | Pretrained Models | 70.7 | 39.18 | 41.13 | 62.0 | 26.25 | 38.9 | 65.43 | 1.36 | BloomForCausalLM |
| weblab-10b-instruction-sft | Chat Models | 100 | 39.13 | 40.1 | 65.3 | 26.66 | 36.79 | 64.09 | 1.82 | GPTNeoXForCausalLM |
| h2o-danube-1.8b-base | Pretrained Models | 18.3 | 39.12 | 39.42 | 69.58 | 25.94 | 33.86 | 64.48 | 1.44 | MistralForCausalLM |
| robin-33B-v2-GPTQ | Unkown Model Types | 355.8 | 39.1 | 27.73 | 26.29 | 23.53 | 49.54 | 79.79 | 27.75 | LlamaForCausalLM |
| OPT-6.7B-Erebus | Fine Tuned Models | 67 | 39.09 | 39.16 | 68.66 | 24.58 | 35.12 | 65.98 | 1.06 | OPTForCausalLM |
| opt-6.7b | Pretrained Models | 67 | 39.08 | 39.16 | 68.66 | 24.57 | 35.12 | 65.98 | 0.99 | OPTForCausalLM |
| RedPajama-INCITE-Instruct-3B-v1 | Fine Tuned Models | 30 | 39.06 | 41.55 | 65.48 | 25.03 | 36.41 | 64.48 | 1.36 | GPTNeoXForCausalLM |
| deacon-3b | Chat Models | 34.3 | 39.05 | 39.68 | 66.42 | 27.13 | 36.07 | 64.64 | 0.38 | LlamaForCausalLM |
| ScarletPajama-3B-HF | Fine Tuned Models | 30 | 39.04 | 39.76 | 64.89 | 27.28 | 37.6 | 64.48 | 0.23 | GPTNeoXForCausalLM |
| orca_mini_3b | Fine Tuned Models | 33.2 | 39.03 | 41.55 | 61.52 | 26.79 | 42.42 | 61.8 | 0.08 | Unknown |
| black_goo_recipe_c | Chat Models | 0 | 39.01 | 38.74 | 66.83 | 26.57 | 36.54 | 64.72 | 0.68 | LlamaForCausalLM |
| Guanaco-3B-Uncensored-v2 | Fine Tuned Models | 27.8 | 38.98 | 42.15 | 66.72 | 26.18 | 35.21 | 63.3 | 0.3 | GPTNeoXForCausalLM |
| cross_lingual_epoch2 | Chat Models | 0 | 38.97 | 39.25 | 47.92 | 36.66 | 47.9 | 62.12 | 0.0 | LlamaForCausalLM |
| open_llama_3b_instruct_v_0.2 | Chat Models | 34.3 | 38.97 | 38.48 | 66.77 | 25.34 | 38.16 | 63.46 | 1.59 | LlamaForCausalLM |
| Guanaco-3B-Uncensored-v2-GPTQ | Fine Tuned Models | 47.8 | 38.95 | 41.64 | 64.76 | 26.25 | 36.58 | 64.33 | 0.15 | GPTNeoXForCausalLM |
| Guanaco-3B-Uncensored | Fine Tuned Models | 27.8 | 38.94 | 42.49 | 66.99 | 25.55 | 34.71 | 63.38 | 0.53 | GPTNeoXForCausalLM |
| mamba-gpt-3b | Fine Tuned Models | 34.3 | 38.87 | 40.53 | 64.94 | 25.35 | 37.14 | 65.04 | 0.23 | LlamaForCausalLM |
| OPT-6.7B-Nerybus-Mix | Fine Tuned Models | 67 | 38.83 | 39.16 | 68.63 | 24.47 | 34.84 | 65.11 | 0.76 | OPTForCausalLM |
| pythia-12b | Pretrained Models | 120 | 38.82 | 39.59 | 68.82 | 26.76 | 31.85 | 64.17 | 1.74 | GPTNeoXForCausalLM |
| WizardVicuna-open-llama-3b-v2 | Chat Models | 34.3 | 38.77 | 37.71 | 66.6 | 27.23 | 36.8 | 63.3 | 0.99 | LlamaForCausalLM |
| black_goo_recipe_a | Chat Models | 0 | 38.73 | 38.14 | 66.56 | 25.75 | 37.46 | 63.93 | 0.53 | LlamaForCausalLM |
| OPT-6B-nerys-v2 | Fine Tuned Models | 60 | 38.72 | 38.4 | 68.57 | 24.34 | 34.73 | 65.59 | 0.68 | OPTForCausalLM |
| instruct-12b | Fine Tuned Models | 120 | 38.63 | 42.58 | 66.76 | 26.79 | 31.96 | 63.46 | 0.23 | GPTNeoXForCausalLM |
| h2ogpt-oig-oasst1-256-6_9b | Fine Tuned Models | 90 | 38.62 | 39.93 | 65.42 | 26.39 | 35.0 | 63.38 | 1.59 | GPTNeoXForCausalLM |
| weblab-10b | Pretrained Models | 100 | 38.59 | 39.51 | 65.76 | 26.29 | 36.02 | 62.51 | 1.44 | GPTNeoXForCausalLM |
| black_goo_recipe_d | Chat Models | 0 | 38.57 | 37.8 | 66.5 | 26.64 | 36.46 | 63.61 | 0.38 | LlamaForCausalLM |
| RedPajama-INCITE-Base-3B-v1 | Pretrained Models | 30 | 38.54 | 40.19 | 64.77 | 27.03 | 33.23 | 64.72 | 1.29 | GPTNeoXForCausalLM |
| OPT-30B-Erebus | Fine Tuned Models | 300 | 38.53 | 36.69 | 65.6 | 24.8 | 38.76 | 65.11 | 0.23 | OPTForCausalLM |
| CrimsonPajama | Fine Tuned Models | 0 | 38.52 | 40.19 | 65.47 | 25.95 | 33.78 | 65.19 | 0.53 | GPTNeoXForCausalLM |
| h2ogpt-oig-oasst1-512-6_9b | Fine Tuned Models | 90 | 38.52 | 40.44 | 65.58 | 24.9 | 36.68 | 62.51 | 0.99 | GPTNeoXForCausalLM |
| guanaco-33B-GPTQ | Unkown Model Types | 355.8 | 38.51 | 28.16 | 26.34 | 24.94 | 48.98 | 78.85 | 23.81 | LlamaForCausalLM |
| LLongMA-3b-LIMA | Chat Models | 30 | 38.51 | 39.08 | 67.15 | 26.43 | 34.71 | 63.38 | 0.3 | LlamaForCausalLM |
| pythia-6.9b-HC3 | Unkown Model Types | 69 | 38.51 | 36.52 | 61.76 | 26.94 | 45.05 | 60.77 | 0.0 | GPTNeoXForCausalLM |
| black_goo_recipe_b | Chat Models | 0 | 38.49 | 37.63 | 66.72 | 25.68 | 37.09 | 63.77 | 0.08 | LlamaForCausalLM |
| RedPajama-INCITE-Chat-3B-ShareGPT-11K | Fine Tuned Models | 30 | 38.47 | 40.61 | 64.84 | 26.13 | 35.41 | 63.54 | 0.3 | GPTNeoXForCausalLM |
| pygmalion-6b | Fine Tuned Models | 60 | 38.47 | 40.53 | 67.47 | 25.73 | 32.53 | 62.51 | 2.05 | GPTJForCausalLM |
| WizardLM-33B-V1.0-Uncensored-GPTQ | Fine Tuned Models | 355.8 | 38.43 | 27.39 | 26.03 | 25.81 | 48.9 | 77.9 | 24.56 | LlamaForCausalLM |
| OmegLLaMA-3B | Fine Tuned Models | 34.3 | 38.28 | 40.36 | 66.13 | 28.0 | 33.31 | 61.64 | 0.23 | LlamaForCausalLM |
| open_llama_3b | Pretrained Models | 30 | 38.26 | 39.85 | 62.65 | 26.94 | 34.97 | 64.72 | 0.45 | LlamaForCausalLM |
| FLOR-6.3B-xat | Fine Tuned Models | 62.5 | 38.23 | 38.65 | 63.76 | 26.54 | 37.96 | 62.43 | 0.0 | BloomForCausalLM |
| pythia-6.7b | Pretrained Models | 66.5 | 38.06 | 40.1 | 65.0 | 24.64 | 32.85 | 64.72 | 1.06 | Unknown |
| Zro1.5_3B | Fine Tuned Models | 27.8 | 38.02 | 35.92 | 61.11 | 25.55 | 36.89 | 58.72 | 9.93 | GPTNeoXForCausalLM |
| Tinyllama-Cinder-1.3B-Reason-Test | Fine Tuned Models | 12.8 | 37.88 | 34.56 | 58.24 | 25.79 | 39.93 | 63.93 | 4.85 | LlamaForCausalLM |
| Galactica-6.7B-EssayWriter | Fine Tuned Models | 66.6 | 37.75 | 40.1 | 50.29 | 33.88 | 40.27 | 58.48 | 3.49 | OPTForCausalLM |
| falcon-rw-1b-instruct-openorca | Chat Models | 13.1 | 37.63 | 34.56 | 60.93 | 28.77 | 37.42 | 60.69 | 3.41 | FalconForCausalLM |
| falcon_1b_stage2 | Fine Tuned Models | 10 | 37.59 | 35.49 | 65.56 | 23.83 | 38.32 | 62.35 | 0.0 | FalconForCausalLM |
| bloom-zh-3b-chat | Fine Tuned Models | 30 | 37.58 | 38.82 | 54.71 | 31.62 | 41.25 | 58.64 | 0.45 | BloomForCausalLM |
| h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2 | Fine Tuned Models | 70 | 37.55 | 36.43 | 61.41 | 25.01 | 37.59 | 64.64 | 0.23 | LlamaForCausalLM |
| Evaloric-1.1B | Chat Models | 11 | 37.54 | 35.07 | 60.93 | 25.36 | 37.78 | 64.96 | 1.14 | LlamaForCausalLM |
| CodeLlama-13B-Python-fp16 | Fine Tuned Models | 130.2 | 37.52 | 33.19 | 44.5 | 25.94 | 43.99 | 67.4 | 10.08 | LlamaForCausalLM |
| Cerebras-GPT-13B | Pretrained Models | 130 | 37.4 | 38.14 | 60.01 | 25.92 | 39.19 | 59.83 | 1.29 | GPT2Model |
| falcon-rw-1b-chat | Chat Models | 13.1 | 37.37 | 35.58 | 61.12 | 24.51 | 39.62 | 61.72 | 1.67 | FalconForCausalLM |
| StellarX-4B-V0 | Pretrained Models | 40 | 37.31 | 36.95 | 61.9 | 26.85 | 34.3 | 63.85 | 0.0 | GPTNeoXForCausalLM |
| manovyadh-1.1B-v1-chat | Fine Tuned Models | 11 | 37.3 | 35.92 | 60.03 | 25.82 | 39.17 | 61.09 | 1.74 | LlamaForCausalLM |
| TinyLlama-1.1B-Chat-v1.0 | Fine Tuned Models | 11 | 37.28 | 36.09 | 61.1 | 25.39 | 37.48 | 61.25 | 2.35 | LlamaForCausalLM |
| WizardLM-30B-Uncensored-GPTQ | Unkown Model Types | 355.8 | 37.27 | 29.44 | 26.47 | 24.35 | 49.15 | 73.16 | 21.08 | LlamaForCausalLM |
| RedPajama-INCITE-Chat-3B-v1-FT-LoRA-8bit-test1 | Fine Tuned Models | 30 | 37.27 | 38.65 | 63.53 | 25.16 | 36.07 | 60.14 | 0.08 | Unknown |
| galactica-6.7b-evol-instruct-70k | Unkown Model Types | 67 | 37.27 | 42.58 | 49.3 | 32.96 | 42.1 | 56.27 | 0.38 | OPTForCausalLM |
| falcon_1b_stage1 | Fine Tuned Models | 10 | 37.25 | 35.15 | 62.4 | 24.47 | 40.0 | 61.48 | 0.0 | FalconForCausalLM |
| Tinyllama-Cinder-1.3B-Reason-Test.2 | Fine Tuned Models | 12.8 | 37.25 | 32.76 | 58.27 | 24.39 | 39.0 | 65.04 | 4.02 | LlamaForCausalLM |
| gpt-sw3-6.7b | Pretrained Models | 71.1 | 37.23 | 36.35 | 60.75 | 26.0 | 39.04 | 60.69 | 0.53 | GPT2LMHeadModel |
| TinyLlama-3T-Cinder-v1.3 | Merged Models or MoE Models | 11 | 37.23 | 33.96 | 58.14 | 25.41 | 38.13 | 63.93 | 3.79 | LlamaForCausalLM |
| TinyLlama-1.1B-orca-v1.0 | Chat Models | 11 | 37.17 | 36.35 | 61.23 | 25.18 | 36.58 | 61.4 | 2.27 | LlamaForCausalLM |
| TinyLlama-1.1B-Chat-v1.0 | Fine Tuned Models | 11 | 37.17 | 35.92 | 61.11 | 25.0 | 37.38 | 61.17 | 2.43 | LlamaForCausalLM |