加载中...
加载中...
Open LLM Leaderboard是追踪大模型评测结果的排行榜,通过追踪大语言模型和ChatBot在不同评测任务上的表现来对模型进行排名和评估。
数据来源: HuggingFace
| 模型名称 | 模型类型 | 参数大小(亿) | 平均分 | ARC分数 | HellaSwag分数 | MMLU分数 | TruthfulQA分数 | Winogrande分数 | GSM8K分数 | 模型架构 |
|---|---|---|---|---|---|---|---|---|---|---|
| test_mistral2 | Fine Tuned Models | 71.1 | 29.27 | 27.9 | 25.32 | 24.74 | 49.1 | 48.54 | 0.0 | MistralModel |
| gpt2-dolly | Chat Models | 1.2 | 29.21 | 22.7 | 30.15 |
数据仅供参考,以官方来源为准。模型名称旁的链接可跳转到 DataLearner 模型详情页。
| 25.81 |
| 44.97 |
| 51.46 |
| 0.15 |
| GPT2LMHeadModel |
| Pythia-70M-ChatSalad | Fine Tuned Models | 1 | 29.2 | 20.99 | 27.28 | 24.78 | 49.74 | 52.41 | 0.0 | GPTNeoXForCausalLM |
| smol_llama-220M-open_instruct | Chat Models | 2.2 | 29.19 | 25.0 | 29.71 | 26.11 | 44.06 | 50.28 | 0.0 | LlamaForCausalLM |
| DialoGPT-small | Fine Tuned Models | 1.8 | 29.19 | 25.77 | 25.79 | 25.81 | 47.49 | 50.28 | 0.0 | GPT2LMHeadModel |
| mistral-environment-all | Fine Tuned Models | 72.4 | 29.18 | 29.44 | 25.89 | 23.12 | 47.92 | 48.7 | 0.0 | MistralForCausalLM |
| testfinetunedmodel | Fine Tuned Models | 1.2 | 29.18 | 25.85 | 31.4 | 26.07 | 40.75 | 50.99 | 0.0 | GPT2LMHeadModel |
| TinyMistral-v2.5-MiniPile-Guidelines-E1 | Pretrained Models | 0 | 29.16 | 26.54 | 25.65 | 23.44 | 49.9 | 49.41 | 0.0 | MistralForCausalLM |
| TinyMistral-v2.5-MiniPile-Guidelines-E1 | Pretrained Models | 0 | 29.15 | 26.45 | 25.68 | 23.53 | 49.85 | 49.41 | 0.0 | MistralForCausalLM |
| pythia-31m-KI_v1-2048-scratch | Pretrained Models | 0.3 | 29.15 | 23.12 | 25.23 | 23.12 | 51.67 | 51.78 | 0.0 | GPTNeoXForCausalLM |
| opt-125m | Pretrained Models | 1.2 | 29.15 | 22.87 | 31.47 | 26.02 | 42.87 | 51.62 | 0.08 | OPTForCausalLM |
| gpt-neo-125m-neurallinguisticpioneers | Fine Tuned Models | 1.2 | 29.15 | 22.44 | 30.36 | 25.14 | 45.64 | 51.22 | 0.08 | GPTNeoForCausalLM |
| Cerebras-GPT-590M | Unkown Model Types | 5.9 | 29.14 | 23.72 | 32.4 | 25.97 | 44.15 | 48.15 | 0.45 | ? |
| Llama-2-7b-Chat-AWQ | Fine Tuned Models | 11.3 | 29.14 | 27.22 | 25.48 | 24.67 | 49.95 | 47.51 | 0.0 | Unknown |
| TinyYi-7b-Test | Fine Tuned Models | 60.6 | 29.11 | 26.88 | 26.14 | 24.41 | 46.35 | 50.91 | 0.0 | Unknown |
| gpt3-finnish-large | Pretrained Models | 0 | 29.11 | 21.76 | 32.88 | 24.11 | 44.35 | 51.54 | 0.0 | BloomModel |
| gpt-neox-122m-minipile-digits | Fine Tuned Models | 1.7 | 29.1 | 20.73 | 27.03 | 25.31 | 49.19 | 52.33 | 0.0 | GPTNeoXForCausalLM |
| 160M-TinyLLama-Mini-Cinder | Fine Tuned Models | 1.4 | 29.09 | 24.66 | 28.16 | 25.09 | 44.08 | 52.57 | 0.0 | LlamaForCausalLM |
| mpt-1b-redpajama-200b | Fine Tuned Models | 10 | 29.05 | 25.77 | 26.08 | 24.5 | 47.57 | 50.36 | 0.0 | MosaicGPT |
| pythia-160m | Pretrained Models | 2.1 | 29.02 | 22.78 | 30.34 | 24.95 | 44.26 | 51.54 | 0.23 | GPTNeoXForCausalLM |
| gpt2-conversational-or-qa | Fine Tuned Models | 1.4 | 29.01 | 21.42 | 27.61 | 26.51 | 47.31 | 51.14 | 0.08 | GPT2LMHeadModel |
| hepu-o4zf-ravz-7-0 | Fine Tuned Models | 72.4 | 29.01 | 24.49 | 25.36 | 23.27 | 51.67 | 49.25 | 0.0 | MistralForCausalLM |
| SmolLlamix-8x101M | Pretrained Models | 4 | 28.98 | 22.7 | 28.5 | 24.69 | 46.09 | 51.3 | 0.61 | MixtralForCausalLM |
| smol_llama-101M-GQA | Pretrained Models | 1 | 28.97 | 23.55 | 28.77 | 24.24 | 45.76 | 50.67 | 0.83 | LlamaForCausalLM |
| smol_llama-101M-GQA | Fine Tuned Models | 1 | 28.96 | 23.46 | 28.73 | 24.35 | 45.8 | 50.67 | 0.76 | LlamaForCausalLM |
| OPT-19M-ChatSalad | Fine Tuned Models | 0.2 | 28.96 | 24.4 | 25.15 | 23.12 | 51.36 | 49.72 | 0.0 | OPTForCausalLM |
| pythia-70m | Pretrained Models | 1 | 28.93 | 21.59 | 27.29 | 25.9 | 47.06 | 51.46 | 0.3 | Unknown |
| opt-125m-gqa-ub-6-best-for-KV-cache | Pretrained Models | 1.2 | 28.93 | 24.23 | 25.0 | 23.12 | 49.53 | 51.7 | 0.0 | OPTForCausalLM |
| Mixsmol-4x400M-v0.1-epoch2 | Pretrained Models | 17.7 | 28.92 | 23.55 | 32.6 | 25.26 | 39.24 | 52.64 | 0.23 | MixtralForCausalLM |
| 590m | Unkown Model Types | 6.7 | 28.88 | 24.15 | 31.91 | 26.61 | 42.19 | 48.38 | 0.08 | GPT2LMHeadModel |
| open-calm-large | Pretrained Models | 0 | 28.88 | 20.73 | 29.56 | 25.23 | 46.52 | 51.14 | 0.08 | GPTNeoXForCausalLM |
| gpt2_137m_DolphinCoder | Fine Tuned Models | 1.4 | 28.87 | 21.84 | 31.35 | 25.4 | 41.58 | 52.01 | 1.06 | Unknown |
| gpt2_137m_DolphinCoder | Fine Tuned Models | 1.4 | 28.87 | 21.84 | 31.35 | 25.4 | 41.58 | 52.01 | 1.06 | Unknown |
| DialoGPT-medium | Fine Tuned Models | 0 | 28.86 | 24.49 | 26.21 | 25.84 | 47.06 | 49.57 | 0.0 | GPT2LMHeadModel |
| easyTermsSummerizer | Fine Tuned Models | 4.1 | 28.86 | 25.77 | 25.81 | 23.12 | 47.69 | 50.75 | 0.0 | Unknown |
| FinOPT-Washington | Fine Tuned Models | 1.2 | 28.85 | 25.17 | 26.25 | 24.83 | 45.8 | 51.07 | 0.0 | OPTForCausalLM |
| pythia-31m-goodwiki-deduped-2048-scratch | Pretrained Models | 0.3 | 28.85 | 23.12 | 25.66 | 23.11 | 51.32 | 49.88 | 0.0 | GPTNeoXForCausalLM |
| distilgpt2-emailgen | Fine Tuned Models | 0.9 | 28.84 | 21.76 | 27.52 | 25.97 | 46.17 | 51.62 | 0.0 | GPT2LMHeadModel |
| facebook-opt-6.7b-gqa-ub-16-best-for-KV-cache | Pretrained Models | 67 | 28.84 | 23.04 | 25.94 | 23.12 | 48.99 | 51.93 | 0.0 | OPTForCausalLM |
| pythia-31m | Pretrained Models | 0.3 | 28.81 | 21.84 | 27.0 | 24.97 | 49.1 | 49.72 | 0.23 | GPTNeoXForCausalLM |
| Yi-8B-Llama | Unkown Model Types | 87.3 | 28.78 | 25.68 | 26.79 | 24.14 | 47.79 | 48.3 | 0.0 | Unknown |
| pythia-owt2-70m-100k | Fine Tuned Models | 0.7 | 28.78 | 20.9 | 28.34 | 25.02 | 45.12 | 53.28 | 0.0 | Unknown |
| TinyMistral-248M-v2 | Pretrained Models | 2.5 | 28.78 | 21.25 | 26.56 | 23.39 | 49.6 | 51.85 | 0.0 | MistralForCausalLM |
| 256_5epoch | Fine Tuned Models | 3.2 | 28.76 | 22.27 | 28.99 | 26.62 | 41.71 | 52.72 | 0.23 | GPT2LMHeadModel |
| Smol-Llama-101M-Chat-v1 | Fine Tuned Models | 1 | 28.73 | 22.87 | 28.69 | 24.93 | 45.76 | 50.04 | 0.08 | LlamaForCausalLM |
| pythia-owt2-70m-50k | Fine Tuned Models | 0.7 | 28.71 | 21.5 | 28.15 | 25.7 | 44.5 | 52.41 | 0.0 | Unknown |
| pythia-70m-deduped-cleansharegpt-en | Fine Tuned Models | 0.7 | 28.71 | 21.16 | 27.16 | 25.24 | 48.57 | 50.12 | 0.0 | GPTNeoXForCausalLM |
| verysmol_llama-v11-KIx2 | Pretrained Models | 0.6 | 28.7 | 22.7 | 27.6 | 25.28 | 44.75 | 51.54 | 0.3 | LlamaForCausalLM |
| facebook-opt-125m-qcqa-ub-6-best-for-KV-cache | Pretrained Models | 1.2 | 28.66 | 24.23 | 25.0 | 23.12 | 48.41 | 51.22 | 0.0 | OPTForCausalLM |
| nano-phi-115M-v0.1 | Pretrained Models | 1.2 | 28.66 | 21.93 | 27.86 | 25.34 | 46.0 | 50.83 | 0.0 | PhiForCausalLM |
| distilgpt2-emailgen-V2 | Fine Tuned Models | 0.9 | 28.64 | 20.99 | 26.78 | 25.53 | 46.51 | 52.01 | 0.0 | GPT2LMHeadModel |
| pythia-31m-simplewiki-scratch-bf16 | Pretrained Models | 0.3 | 28.61 | 22.78 | 25.61 | 23.12 | 49.65 | 50.51 | 0.0 | GPTNeoXForCausalLM |
| pythia-31m-simplepile-lite-2048-scratch-2e | Pretrained Models | 0.3 | 28.6 | 21.59 | 25.79 | 24.99 | 50.62 | 48.62 | 0.0 | GPTNeoXForCausalLM |
| facebook-opt-6.7b-qcqa-ub-16-best-for-KV-cache | Pretrained Models | 67 | 28.58 | 23.81 | 27.05 | 23.12 | 46.69 | 50.83 | 0.0 | OPTForCausalLM |
| gpt2_open-platypus | Chat Models | 1.2 | 28.58 | 22.18 | 31.29 | 26.19 | 40.35 | 51.3 | 0.15 | GPT2LMHeadModel |
| KoAlpaca-KoRWKV-6B | Chat Models | 65.3 | 28.57 | 23.46 | 31.65 | 24.89 | 39.83 | 51.62 | 0.0 | RwkvForCausalLM |
| RWKV-4-PilePlus-169M-20230520-done-ctx4096 | Fine Tuned Models | 1.3 | 28.57 | 23.98 | 32.25 | 23.37 | 42.29 | 49.17 | 0.38 | Unknown |
| chat_gpt2_dpo | Fine Tuned Models | 1.2 | 28.56 | 23.98 | 31.22 | 24.95 | 41.26 | 49.96 | 0.0 | GPT2LMHeadModel |
| falcon-1b-cot-t2 | Fine Tuned Models | 13.1 | 28.56 | 24.74 | 24.75 | 23.12 | 48.38 | 50.36 | 0.0 | FalconForCausalLM |
| My_GPT2 | Fine Tuned Models | 1.4 | 28.55 | 21.93 | 31.59 | 25.84 | 40.73 | 50.51 | 0.68 | GPT2LMHeadModel |
| gpt2 | Pretrained Models | 1.4 | 28.53 | 22.01 | 31.53 | 25.83 | 40.69 | 50.43 | 0.68 | GPT2LMHeadModel |
| Quokka_590m | Fine Tuned Models | 6.7 | 28.53 | 24.4 | 31.61 | 25.36 | 39.59 | 50.2 | 0.0 | GPT2LMHeadModel |
| gpt2_guanaco-dolly-platypus | Chat Models | 1.2 | 28.52 | 23.55 | 31.03 | 26.4 | 40.02 | 50.12 | 0.0 | GPT2LMHeadModel |
| gpt2_platypus-dolly-guanaco | Chat Models | 1.2 | 28.51 | 23.21 | 31.04 | 26.16 | 40.31 | 50.36 | 0.0 | GPT2LMHeadModel |
| math_gpt2 | Fine Tuned Models | 0 | 28.5 | 24.23 | 30.88 | 25.38 | 39.23 | 51.07 | 0.23 | GPT2LMHeadModel |
| distillgpt2Cinder | Fine Tuned Models | 0.8 | 28.5 | 24.49 | 27.24 | 24.97 | 43.96 | 50.12 | 0.23 | GPT2LMHeadModel |
| gpt_bigcode-santacoder | Pretrained Models | 11.2 | 28.49 | 21.16 | 30.84 | 24.97 | 45.64 | 47.83 | 0.53 | GPTBigCodeForCausalLM |
| lamini-cerebras-256m | Fine Tuned Models | 2.6 | 28.49 | 21.76 | 28.7 | 26.66 | 41.81 | 52.01 | 0.0 | Unknown |
| code_gpt2_mini_model | Fine Tuned Models | 1.2 | 28.49 | 23.72 | 31.25 | 24.96 | 39.86 | 51.14 | 0.0 | GPT2LMHeadModel |
| gpt-sw3-126m | Pretrained Models | 1.9 | 28.49 | 22.18 | 29.54 | 24.43 | 44.03 | 50.67 | 0.08 | GPT2LMHeadModel |
| TinyStories-Alpaca | Fine Tuned Models | 0.7 | 28.46 | 23.98 | 24.92 | 23.35 | 46.68 | 51.85 | 0.0 | GPTNeoForCausalLM |
| phi-2-upscaled-4B-instruct-v0.1 | Fine Tuned Models | 40.4 | 28.45 | 22.95 | 28.68 | 26.8 | 40.92 | 50.59 | 0.76 | PhiForCausalLM |
| Mixsmol-4x400M-v0.1-epoch1 | Chat Models | 17.7 | 28.45 | 22.87 | 30.57 | 25.28 | 39.03 | 52.8 | 0.15 | MixtralForCausalLM |
| Mixtral-GQA-400m-v2 | Pretrained Models | 20.1 | 28.45 | 20.22 | 27.78 | 26.1 | 46.55 | 49.96 | 0.08 | MixtralForCausalLM |
| gpt-sw3-126m | Pretrained Models | 1.9 | 28.45 | 22.01 | 29.56 | 24.53 | 44.07 | 50.43 | 0.08 | GPT2LMHeadModel |
| Llama-Flan-XL2base | Unkown Model Types | 20 | 28.44 | 20.65 | 25.33 | 23.19 | 50.58 | 50.91 | 0.0 | LlamaForCausalLM |
| pythia-70m-deduped | Pretrained Models | 1 | 28.44 | 21.08 | 27.17 | 25.26 | 47.51 | 49.64 | 0.0 | GPTNeoXForCausalLM |
| boomer-1b | Pretrained Models | 10 | 28.44 | 22.78 | 31.58 | 25.66 | 39.17 | 50.51 | 0.91 | LlamaForCausalLM |
| TinyMistral-v2-Test1 | Pretrained Models | 0 | 28.42 | 21.5 | 26.79 | 23.36 | 50.3 | 48.54 | 0.0 | MistralForCausalLM |
| gpt2_camel_physics-platypus | Chat Models | 1.2 | 28.41 | 23.04 | 31.32 | 26.91 | 39.56 | 49.64 | 0.0 | GPT2LMHeadModel |
| gpt2_platypus-camel_physics | Chat Models | 1.2 | 28.41 | 23.04 | 31.32 | 26.91 | 39.56 | 49.64 | 0.0 | Unknown |
| gpt2_test | Pretrained Models | 1.4 | 28.4 | 21.84 | 31.6 | 25.86 | 40.67 | 50.12 | 0.3 | GPT2LMHeadModel |
| finetuned-gpt2-tiny | Fine Tuned Models | 0 | 28.4 | 21.84 | 31.6 | 25.86 | 40.67 | 50.12 | 0.3 | GPT2LMHeadModel |
| gpt2_platypus-camel_physics | Chat Models | 1.2 | 28.4 | 22.78 | 31.24 | 25.87 | 38.95 | 51.54 | 0.0 | Unknown |
| lamini-cerebras-590m | Unkown Model Types | 5.9 | 28.38 | 24.32 | 31.58 | 25.57 | 40.72 | 47.91 | 0.15 | Unknown |
| facebook-opt-125m-qcqa-ub-6-best-for-q-loss | Pretrained Models | 1.2 | 28.37 | 23.29 | 25.57 | 23.15 | 49.03 | 49.17 | 0.0 | OPTForCausalLM |
| gpt2-alpaca-gpt4 | Fine Tuned Models | 1.4 | 28.34 | 22.61 | 31.17 | 25.76 | 38.04 | 52.17 | 0.3 | GPT2LMHeadModel |
| Quokka_256m | Fine Tuned Models | 3.2 | 28.32 | 22.87 | 28.84 | 26.48 | 39.47 | 52.25 | 0.0 | GPT2LMHeadModel |
| convo_bot_gpt_v1 | Fine Tuned Models | 0 | 28.3 | 22.35 | 31.07 | 26.12 | 38.71 | 51.54 | 0.0 | GPT2LMHeadModel |
| GPT-2-SlimOrcaDeduped-airoboros-3.1-MetaMathQA-SFT-124M | Chat Models | 1.2 | 28.3 | 24.57 | 29.43 | 25.82 | 38.84 | 49.01 | 2.12 | Unknown |
| pythia-31m | Pretrained Models | 0.3 | 28.3 | 19.97 | 26.34 | 24.27 | 50.12 | 49.09 | 0.0 | GPTNeoXForCausalLM |
| dlite-v2-124m | Fine Tuned Models | 1.2 | 28.3 | 23.98 | 31.1 | 25.29 | 38.98 | 50.43 | 0.0 | GPT2LMHeadModel |
| ko-wand-136M | Pretrained Models | 1.4 | 28.29 | 21.33 | 25.0 | 23.58 | 50.68 | 49.17 | 0.0 | MistralForCausalLM |
| lamini-cerebras-111m | Fine Tuned Models | 1.1 | 28.29 | 22.1 | 27.12 | 25.51 | 43.79 | 51.22 | 0.0 | Unknown |
| pythia-31m-simplewiki-2048 | Pretrained Models | 0.3 | 28.27 | 22.18 | 25.55 | 23.12 | 49.37 | 49.41 | 0.0 | GPTNeoXForCausalLM |
| facebook-opt-6.7b-qcqa-ub-16-best-for-q-loss | Pretrained Models | 67 | 28.25 | 21.67 | 26.65 | 23.15 | 46.81 | 51.22 | 0.0 | OPTForCausalLM |
| open-calm-7b | Fine Tuned Models | 70 | 28.21 | 20.48 | 30.65 | 25.22 | 44.15 | 48.54 | 0.23 | GPTNeoXForCausalLM |
| gpt2023 | Fine Tuned Models | 1.4 | 28.2 | 21.93 | 31.11 | 25.05 | 40.71 | 50.12 | 0.3 | GPT2LMHeadModel |
| gpt-sw3-126m-instruct | Chat Models | 1.9 | 28.2 | 23.38 | 29.88 | 23.78 | 42.65 | 48.54 | 0.99 | GPT2LMHeadModel |
| TinyMistral-248M-SFT-v4 | Chat Models | 2.5 | 28.2 | 24.91 | 28.15 | 26.04 | 39.56 | 50.51 | 0.0 | MistralForCausalLM |