加载中...
加载中...
Open LLM Leaderboard是追踪大模型评测结果的排行榜,通过追踪大语言模型和ChatBot在不同评测任务上的表现来对模型进行排名和评估。
数据来源: HuggingFace
| 模型名称 | 模型类型 | 参数大小(亿) | 平均分 | ARC分数 | HellaSwag分数 | MMLU分数 | TruthfulQA分数 | Winogrande分数 | GSM8K分数 | 模型架构 |
|---|---|---|---|---|---|---|---|---|---|---|
| pythia-6.7b | Pretrained Models | 66.5 | 38.06 | 40.1 | 65.0 | 24.64 | 32.85 | 64.72 | 1.06 | Unknown |
| Cerebras-GPT-13B | Pretrained Models | 130 | 37.4 |
数据仅供参考,以官方来源为准。模型名称旁的链接可跳转到 DataLearner 模型详情页。
| 38.14 |
| 60.01 |
| 25.92 |
| 39.19 |
| 59.83 |
| 1.29 |
| GPT2Model |
| StellarX-4B-V0 | Pretrained Models | 40 | 37.31 | 36.95 | 61.9 | 26.85 | 34.3 | 63.85 | 0.0 | GPTNeoXForCausalLM |
| gpt-sw3-6.7b | Pretrained Models | 71.1 | 37.23 | 36.35 | 60.75 | 26.0 | 39.04 | 60.69 | 0.53 | GPT2LMHeadModel |
| pythia-2.7b | Pretrained Models | 29.1 | 37.09 | 37.37 | 60.74 | 25.86 | 35.4 | 62.12 | 1.06 | Unknown |
| falcon-rw-1b | Pretrained Models | 10 | 37.07 | 35.07 | 63.56 | 25.28 | 35.96 | 62.04 | 0.53 | FalconForCausalLM |
| opt-2.7b | Pretrained Models | 27 | 36.74 | 33.96 | 61.43 | 25.43 | 37.43 | 61.96 | 0.23 | OPTForCausalLM |
| pythia-2.8b-deduped | Pretrained Models | 29.1 | 36.72 | 36.26 | 60.66 | 26.78 | 35.56 | 60.22 | 0.83 | GPTNeoXForCausalLM |
| TinyLlama-1.1B-intermediate-step-1431k-3T | Pretrained Models | 11 | 36.42 | 33.87 | 60.31 | 26.04 | 37.32 | 59.51 | 1.44 | LlamaForCausalLM |
| xglm-7.5B | Pretrained Models | 75 | 36.38 | 34.13 | 60.77 | 27.79 | 36.66 | 58.72 | 0.23 | XGLMForCausalLM |
| Cerebras-GPT-6.7B | Pretrained Models | 67 | 36.27 | 35.07 | 59.36 | 25.93 | 38.02 | 58.72 | 0.53 | ? |
| gpt-neo-2.7B | Pretrained Models | 27.2 | 36.2 | 33.36 | 56.24 | 26.45 | 39.78 | 60.06 | 1.29 | GPTNeoForCausalLM |
| StellarX-4B-V0.2 | Pretrained Models | 40 | 36.15 | 34.64 | 56.74 | 25.55 | 38.55 | 61.4 | 0.0 | GPTNeoXForCausalLM |
| bloom-3b | Pretrained Models | 30 | 36.07 | 35.75 | 54.37 | 26.59 | 40.57 | 57.62 | 1.52 | BloomForCausalLM |
| ShearedLlama-1.3b-FFT-Test1 | Pretrained Models | 13 | 35.71 | 32.68 | 59.99 | 25.69 | 36.97 | 58.72 | 0.23 | LlamaForCausalLM |
| palmyra-base | Pretrained Models | 0 | 35.18 | 31.91 | 55.39 | 27.15 | 37.57 | 58.09 | 0.99 | GPT2LMHeadModel |
| pythia-1.4b-deduped | Pretrained Models | 14 | 35.0 | 32.68 | 54.96 | 25.56 | 38.66 | 57.3 | 0.83 | GPTNeoXForCausalLM |
| pythia-1.3b | Pretrained Models | 13.1 | 34.46 | 31.14 | 51.43 | 26.55 | 39.24 | 57.38 | 0.99 | Unknown |
| PULI-GPTrio | Pretrained Models | 0 | 34.42 | 30.72 | 53.49 | 24.73 | 39.03 | 57.77 | 0.76 | GPTNeoXForCausalLM |
| TinyLlama-1.1B-intermediate-step-480k-1T | Pretrained Models | 10.3 | 34.37 | 30.89 | 52.97 | 25.0 | 39.55 | 57.3 | 0.53 | Unknown |
| stablelm-base-alpha-7b | Pretrained Models | 70 | 34.37 | 32.0 | 51.78 | 26.21 | 40.19 | 55.41 | 0.61 | GPTNeoXForCausalLM |
| xglm-4.5B | Pretrained Models | 50.8 | 34.31 | 31.48 | 57.95 | 25.43 | 35.84 | 54.93 | 0.23 | XGLMForCausalLM |
| gpt-sw3-1.3b | Pretrained Models | 14.4 | 34.31 | 30.38 | 50.4 | 26.14 | 39.97 | 58.88 | 0.08 | GPT2LMHeadModel |
| TinyLlama-1.1B-intermediate-step-240k-503b | Pretrained Models | 11 | 33.72 | 29.27 | 49.71 | 26.26 | 40.17 | 56.59 | 0.3 | Unknown |
| gpt-neo-1.3B | Pretrained Models | 13.7 | 33.58 | 31.23 | 48.47 | 24.82 | 39.63 | 56.91 | 0.45 | GPTNeoForCausalLM |
| polyglot-ko-12.8b | Pretrained Models | 130.6 | 33.33 | 27.05 | 51.68 | 26.64 | 34.69 | 59.75 | 0.15 | GPTNeoXForCausalLM |
| Cerebras-GPT-2.7B | Pretrained Models | 27 | 33.25 | 29.1 | 49.29 | 25.17 | 41.37 | 54.14 | 0.45 | ? |
| gpt3-finnish-13B | Pretrained Models | 130 | 32.95 | 24.66 | 46.76 | 23.49 | 44.47 | 58.01 | 0.3 | BloomModel |
| pythia-1b-deduped | Pretrained Models | 10.8 | 32.78 | 29.1 | 49.65 | 24.27 | 38.94 | 53.59 | 1.14 | GPTNeoXForCausalLM |
| codegen-6B-multi | Pretrained Models | 60 | 32.43 | 27.22 | 41.11 | 25.71 | 45.65 | 53.91 | 0.99 | CodeGenForCausalLM |
| bilingual-gpt-neox-4b-8k | Pretrained Models | 39.5 | 32.23 | 28.58 | 43.94 | 25.38 | 47.48 | 47.99 | 0.0 | GPTNeoXForCausalLM |
| bilingual-gpt-neox-4b | Pretrained Models | 39.5 | 32.14 | 29.18 | 43.73 | 23.1 | 45.0 | 51.85 | 0.0 | GPTNeoXForCausalLM |
| TinyLlama-1.1B-step-50K-105b | Pretrained Models | 11 | 31.86 | 25.85 | 44.1 | 26.78 | 39.51 | 54.38 | 0.53 | Unknown |
| pythia-410m | Pretrained Models | 5.1 | 31.55 | 26.19 | 40.85 | 27.25 | 41.22 | 53.12 | 0.68 | GPTNeoXForCausalLM |
| stablelm-base-alpha-3b | Pretrained Models | 30 | 31.5 | 26.45 | 42.24 | 25.43 | 40.5 | 53.91 | 0.45 | GPTNeoXForCausalLM |
| Cerebras-GPT-1.3B | Pretrained Models | 13 | 31.3 | 26.28 | 38.54 | 26.59 | 42.7 | 53.43 | 0.23 | ? |
| pythia-410m-deduped | Pretrained Models | 5.1 | 31.29 | 24.83 | 41.29 | 25.99 | 40.95 | 54.38 | 0.3 | GPTNeoXForCausalLM |
| gpt-sw3-356m | Pretrained Models | 4.7 | 30.41 | 23.63 | 37.05 | 25.93 | 42.55 | 53.04 | 0.23 | GPT2LMHeadModel |
| megatron-gpt2-345m | Pretrained Models | 3.8 | 30.4 | 24.23 | 39.18 | 24.32 | 41.51 | 52.96 | 0.23 | GPT2LMHeadModel |
| fbopt-350m-8bit | Pretrained Models | 3.3 | 30.21 | 23.55 | 36.6 | 26.22 | 40.97 | 52.64 | 1.29 | OPTForCausalLM |
| LiteLlama-460M-1T | Pretrained Models | 4.6 | 30.16 | 24.83 | 38.39 | 25.96 | 41.59 | 50.2 | 0.0 | LlamaForCausalLM |
| Orca-2-7b-f16 | Pretrained Models | 70 | 30.15 | 29.61 | 25.62 | 26.7 | 48.36 | 50.59 | 0.0 | LlamaForCausalLM |
| opt-350m | Pretrained Models | 3.5 | 30.01 | 23.55 | 36.73 | 26.02 | 40.83 | 52.64 | 0.3 | OPTForCausalLM |
| xglm-564M | Pretrained Models | 5.6 | 29.55 | 24.57 | 34.64 | 25.18 | 40.43 | 52.25 | 0.23 | XGLMForCausalLM |
| mistral7b-test001 | Pretrained Models | 75.8 | 29.49 | 24.66 | 26.78 | 23.12 | 50.07 | 52.33 | 0.0 | Unknown |
| gpt-neo-125m | Pretrained Models | 1.5 | 29.47 | 22.95 | 30.26 | 25.97 | 45.58 | 51.78 | 0.3 | GPTNeoForCausalLM |
| smol_llama-220M-GQA | Pretrained Models | 2.2 | 29.44 | 24.83 | 29.76 | 25.85 | 44.55 | 50.99 | 0.68 | LlamaForCausalLM |
| tiny_starcoder_py | Pretrained Models | 1.6 | 29.41 | 20.99 | 28.77 | 26.79 | 47.68 | 51.22 | 0.99 | GPTBigCodeForCausalLM |
| Cerebras-GPT-256M | Pretrained Models | 2.6 | 29.38 | 22.01 | 28.99 | 26.83 | 45.98 | 52.49 | 0.0 | ? |
| pythia-160m-deduped | Pretrained Models | 2.1 | 29.38 | 24.06 | 31.39 | 24.86 | 44.34 | 51.38 | 0.23 | GPTNeoXForCausalLM |
| DeciCoder-1b | Pretrained Models | 11.1 | 29.37 | 21.16 | 31.09 | 24.34 | 47.05 | 50.83 | 1.74 | DeciCoderForCausalLM |
| SmolLlamix-8x101M-take2 | Pretrained Models | 4 | 29.35 | 23.98 | 28.43 | 25.07 | 45.87 | 52.25 | 0.53 | MixtralForCausalLM |
| test-model | Pretrained Models | 0 | 29.31 | 24.4 | 30.17 | 25.88 | 44.59 | 50.83 | 0.0 | Unknown |
| TinyMistral-v2.5-MiniPile-Guidelines-E1 | Pretrained Models | 0 | 29.16 | 26.54 | 25.65 | 23.44 | 49.9 | 49.41 | 0.0 | MistralForCausalLM |
| TinyMistral-v2.5-MiniPile-Guidelines-E1 | Pretrained Models | 0 | 29.15 | 26.45 | 25.68 | 23.53 | 49.85 | 49.41 | 0.0 | MistralForCausalLM |
| pythia-31m-KI_v1-2048-scratch | Pretrained Models | 0.3 | 29.15 | 23.12 | 25.23 | 23.12 | 51.67 | 51.78 | 0.0 | GPTNeoXForCausalLM |
| opt-125m | Pretrained Models | 1.2 | 29.15 | 22.87 | 31.47 | 26.02 | 42.87 | 51.62 | 0.08 | OPTForCausalLM |
| gpt3-finnish-large | Pretrained Models | 0 | 29.11 | 21.76 | 32.88 | 24.11 | 44.35 | 51.54 | 0.0 | BloomModel |
| pythia-160m | Pretrained Models | 2.1 | 29.02 | 22.78 | 30.34 | 24.95 | 44.26 | 51.54 | 0.23 | GPTNeoXForCausalLM |
| SmolLlamix-8x101M | Pretrained Models | 4 | 28.98 | 22.7 | 28.5 | 24.69 | 46.09 | 51.3 | 0.61 | MixtralForCausalLM |
| smol_llama-101M-GQA | Pretrained Models | 1 | 28.97 | 23.55 | 28.77 | 24.24 | 45.76 | 50.67 | 0.83 | LlamaForCausalLM |
| pythia-70m | Pretrained Models | 1 | 28.93 | 21.59 | 27.29 | 25.9 | 47.06 | 51.46 | 0.3 | Unknown |
| opt-125m-gqa-ub-6-best-for-KV-cache | Pretrained Models | 1.2 | 28.93 | 24.23 | 25.0 | 23.12 | 49.53 | 51.7 | 0.0 | OPTForCausalLM |
| Mixsmol-4x400M-v0.1-epoch2 | Pretrained Models | 17.7 | 28.92 | 23.55 | 32.6 | 25.26 | 39.24 | 52.64 | 0.23 | MixtralForCausalLM |
| open-calm-large | Pretrained Models | 0 | 28.88 | 20.73 | 29.56 | 25.23 | 46.52 | 51.14 | 0.08 | GPTNeoXForCausalLM |
| pythia-31m-goodwiki-deduped-2048-scratch | Pretrained Models | 0.3 | 28.85 | 23.12 | 25.66 | 23.11 | 51.32 | 49.88 | 0.0 | GPTNeoXForCausalLM |
| facebook-opt-6.7b-gqa-ub-16-best-for-KV-cache | Pretrained Models | 67 | 28.84 | 23.04 | 25.94 | 23.12 | 48.99 | 51.93 | 0.0 | OPTForCausalLM |
| pythia-31m | Pretrained Models | 0.3 | 28.81 | 21.84 | 27.0 | 24.97 | 49.1 | 49.72 | 0.23 | GPTNeoXForCausalLM |
| TinyMistral-248M-v2 | Pretrained Models | 2.5 | 28.78 | 21.25 | 26.56 | 23.39 | 49.6 | 51.85 | 0.0 | MistralForCausalLM |
| verysmol_llama-v11-KIx2 | Pretrained Models | 0.6 | 28.7 | 22.7 | 27.6 | 25.28 | 44.75 | 51.54 | 0.3 | LlamaForCausalLM |
| facebook-opt-125m-qcqa-ub-6-best-for-KV-cache | Pretrained Models | 1.2 | 28.66 | 24.23 | 25.0 | 23.12 | 48.41 | 51.22 | 0.0 | OPTForCausalLM |
| nano-phi-115M-v0.1 | Pretrained Models | 1.2 | 28.66 | 21.93 | 27.86 | 25.34 | 46.0 | 50.83 | 0.0 | PhiForCausalLM |
| pythia-31m-simplewiki-scratch-bf16 | Pretrained Models | 0.3 | 28.61 | 22.78 | 25.61 | 23.12 | 49.65 | 50.51 | 0.0 | GPTNeoXForCausalLM |
| pythia-31m-simplepile-lite-2048-scratch-2e | Pretrained Models | 0.3 | 28.6 | 21.59 | 25.79 | 24.99 | 50.62 | 48.62 | 0.0 | GPTNeoXForCausalLM |
| facebook-opt-6.7b-qcqa-ub-16-best-for-KV-cache | Pretrained Models | 67 | 28.58 | 23.81 | 27.05 | 23.12 | 46.69 | 50.83 | 0.0 | OPTForCausalLM |
| gpt2 | Pretrained Models | 1.4 | 28.53 | 22.01 | 31.53 | 25.83 | 40.69 | 50.43 | 0.68 | GPT2LMHeadModel |
| gpt_bigcode-santacoder | Pretrained Models | 11.2 | 28.49 | 21.16 | 30.84 | 24.97 | 45.64 | 47.83 | 0.53 | GPTBigCodeForCausalLM |
| gpt-sw3-126m | Pretrained Models | 1.9 | 28.49 | 22.18 | 29.54 | 24.43 | 44.03 | 50.67 | 0.08 | GPT2LMHeadModel |
| Mixtral-GQA-400m-v2 | Pretrained Models | 20.1 | 28.45 | 20.22 | 27.78 | 26.1 | 46.55 | 49.96 | 0.08 | MixtralForCausalLM |
| gpt-sw3-126m | Pretrained Models | 1.9 | 28.45 | 22.01 | 29.56 | 24.53 | 44.07 | 50.43 | 0.08 | GPT2LMHeadModel |
| pythia-70m-deduped | Pretrained Models | 1 | 28.44 | 21.08 | 27.17 | 25.26 | 47.51 | 49.64 | 0.0 | GPTNeoXForCausalLM |
| boomer-1b | Pretrained Models | 10 | 28.44 | 22.78 | 31.58 | 25.66 | 39.17 | 50.51 | 0.91 | LlamaForCausalLM |
| TinyMistral-v2-Test1 | Pretrained Models | 0 | 28.42 | 21.5 | 26.79 | 23.36 | 50.3 | 48.54 | 0.0 | MistralForCausalLM |
| gpt2_test | Pretrained Models | 1.4 | 28.4 | 21.84 | 31.6 | 25.86 | 40.67 | 50.12 | 0.3 | GPT2LMHeadModel |
| facebook-opt-125m-qcqa-ub-6-best-for-q-loss | Pretrained Models | 1.2 | 28.37 | 23.29 | 25.57 | 23.15 | 49.03 | 49.17 | 0.0 | OPTForCausalLM |
| pythia-31m | Pretrained Models | 0.3 | 28.3 | 19.97 | 26.34 | 24.27 | 50.12 | 49.09 | 0.0 | GPTNeoXForCausalLM |
| ko-wand-136M | Pretrained Models | 1.4 | 28.29 | 21.33 | 25.0 | 23.58 | 50.68 | 49.17 | 0.0 | MistralForCausalLM |
| pythia-31m-simplewiki-2048 | Pretrained Models | 0.3 | 28.27 | 22.18 | 25.55 | 23.12 | 49.37 | 49.41 | 0.0 | GPTNeoXForCausalLM |
| facebook-opt-6.7b-qcqa-ub-16-best-for-q-loss | Pretrained Models | 67 | 28.25 | 21.67 | 26.65 | 23.15 | 46.81 | 51.22 | 0.0 | OPTForCausalLM |
| KoRWKV-6B | Pretrained Models | 65.3 | 28.19 | 22.1 | 32.18 | 24.69 | 39.05 | 51.14 | 0.0 | RwkvForCausalLM |
| smol_llama-81M-tied | Pretrained Models | 0.8 | 28.17 | 22.18 | 29.33 | 24.06 | 43.97 | 49.25 | 0.23 | LlamaForCausalLM |
| gpt3-finnish-small | Pretrained Models | 0 | 27.95 | 20.48 | 28.09 | 24.47 | 46.47 | 48.22 | 0.0 | BloomModel |
| Cerebras-GPT-111M | Pretrained Models | 1.1 | 27.75 | 20.22 | 26.73 | 25.51 | 46.31 | 47.75 | 0.0 | ? |
| TinyMistral-248m | Pretrained Models | 2.5 | 27.73 | 22.87 | 28.02 | 23.15 | 42.52 | 49.8 | 0.0 | Unknown |
| mGPT | Pretrained Models | 0 | 27.61 | 23.81 | 26.37 | 25.17 | 39.62 | 50.67 | 0.0 | GPT2LMHeadModel |
| mptk-1b | Pretrained Models | 13.1 | 20.84 | 22.7 | 25.48 | 27.11 | 0.0 | 49.72 | 0.0 | MptForCausalLM |
| mpt-125m-c4 | Pretrained Models | 1.2 | 20.07 | 22.7 | 25.04 | 23.12 | 0.0 | 49.57 | 0.0 | MPTForCausalLM |