加载中...
加载中...
Open LLM Leaderboard是追踪大模型评测结果的排行榜,通过追踪大语言模型和ChatBot在不同评测任务上的表现来对模型进行排名和评估。
Data source: HuggingFace
| Model | Type | Parameters (B) | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K | Architecture |
|---|---|---|---|---|---|---|---|---|---|---|
| speechless-coder-ds-1.3b | Fine Tuned Models | 13 | 31.4 | 26.54 | 39.49 | 24.85 | 42.12 | 53.04 | 2.35 | LlamaForCausalLM |
| Aira-2-774M | Chat Models | 7.7 | 31.33 | 28.75 | 40.8 |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.
| 25.1 |
| 41.33 |
| 52.01 |
| 0.0 |
| GPT2LMHeadModel |
| gpt-2-xl-EvolInstruct | Fine Tuned Models | 16.1 | 31.32 | 27.39 | 38.46 | 25.67 | 42.76 | 53.51 | 0.15 | GPT2LMHeadModel |
| Cerebras-GPT-1.3B | Pretrained Models | 13 | 31.3 | 26.28 | 38.54 | 26.59 | 42.7 | 53.43 | 0.23 | ? |
| pythia-410m-deduped | Pretrained Models | 5.1 | 31.29 | 24.83 | 41.29 | 25.99 | 40.95 | 54.38 | 0.3 | GPTNeoXForCausalLM |
| dlite-v2-355m | Fine Tuned Models | 3.6 | 31.2 | 28.33 | 40.54 | 26.77 | 38.76 | 52.8 | 0.0 | GPT2LMHeadModel |
| pygmalion-1.3b | Fine Tuned Models | 15.2 | 31.14 | 28.07 | 46.96 | 24.12 | 37.64 | 50.04 | 0.0 | GPTNeoXForCausalLM |
| Aira-2-355M | Chat Models | 3.6 | 31.0 | 27.56 | 38.92 | 27.26 | 38.53 | 53.75 | 0.0 | GPT2LMHeadModel |
| GPTNeo350M-Instruct-SFT | Chat Models | 4.6 | 31.0 | 25.94 | 38.55 | 25.76 | 45.25 | 50.2 | 0.3 | GPTNeoForCausalLM |
| kaori-34b-v4 | Fine Tuned Models | 343.9 | 30.97 | 23.89 | 28.97 | 25.59 | 49.46 | 57.22 | 0.68 | LlamaForCausalLM |
| Kaori-34b-v2 | Fine Tuned Models | 343.9 | 30.97 | 23.89 | 28.97 | 25.59 | 49.46 | 57.22 | 0.68 | LlamaForCausalLM |
| emailgen-pythia-410m-deduped | Fine Tuned Models | 5.1 | 30.93 | 27.9 | 40.04 | 27.35 | 38.2 | 52.09 | 0.0 | GPTNeoXForCausalLM |
| gpt-sw3-356m-instruct | Chat Models | 4.7 | 30.93 | 26.96 | 38.01 | 25.53 | 40.74 | 52.57 | 1.74 | GPT2LMHeadModel |
| Quokka_1.3b | Fine Tuned Models | 14.2 | 30.86 | 27.73 | 37.91 | 26.66 | 40.14 | 52.72 | 0.0 | GPT2LMHeadModel |
| 1.3b | Unkown Model Types | 14.2 | 30.76 | 27.3 | 38.3 | 26.77 | 39.02 | 53.04 | 0.15 | GPT2LMHeadModel |
| bloomz-560m-sft-chat | Fine Tuned Models | 5.6 | 30.72 | 27.47 | 37.05 | 23.93 | 42.35 | 53.51 | 0.0 | BloomForCausalLM |
| dolphinette | Fine Tuned Models | 5.6 | 30.65 | 24.91 | 37.33 | 25.37 | 42.08 | 54.22 | 0.0 | Unknown |
| bloomz-560m | Unkown Model Types | 5.6 | 30.63 | 23.55 | 36.31 | 25.1 | 45.69 | 53.12 | 0.0 | BloomForCausalLM |
| medalpaca-13B-GPTQ-4bit | Unkown Model Types | 162.2 | 30.62 | 29.35 | 26.32 | 25.44 | 49.51 | 53.12 | 0.0 | Unknown |
| dlite-v1-355m | Fine Tuned Models | 3.6 | 30.54 | 27.13 | 39.07 | 27.12 | 37.13 | 52.8 | 0.0 | GPT2LMHeadModel |
| mistral-inst-v02-dpo | Fine Tuned Models | 72.4 | 30.43 | 27.9 | 26.08 | 27.02 | 50.8 | 50.75 | 0.0 | MistralForCausalLM |
| gpt-sw3-356m | Pretrained Models | 4.7 | 30.41 | 23.63 | 37.05 | 25.93 | 42.55 | 53.04 | 0.23 | GPT2LMHeadModel |
| megatron-gpt2-345m | Pretrained Models | 3.8 | 30.4 | 24.23 | 39.18 | 24.32 | 41.51 | 52.96 | 0.23 | GPT2LMHeadModel |
| speechless-codellama-orca-airoboros-13b-0.10e | Chat Models | 130.2 | 30.36 | 29.44 | 25.71 | 25.43 | 49.64 | 51.93 | 0.0 | LlamaForCausalLM |
| Llama-160M-Chat-v1 | Fine Tuned Models | 1.6 | 30.27 | 24.74 | 35.29 | 26.13 | 44.16 | 51.3 | 0.0 | LlamaForCausalLM |
| Llama-2-13b-sf | Fine Tuned Models | 128.5 | 30.22 | 29.52 | 26.49 | 25.98 | 48.97 | 50.36 | 0.0 | Unknown |
| speechless-codellama-orca-airoboros-13b-0.10e | Fine Tuned Models | 130.2 | 30.22 | 29.27 | 25.74 | 25.69 | 49.61 | 50.99 | 0.0 | LlamaForCausalLM |
| fbopt-350m-8bit | Pretrained Models | 3.3 | 30.21 | 23.55 | 36.6 | 26.22 | 40.97 | 52.64 | 1.29 | OPTForCausalLM |
| flyingllama-v2 | Chat Models | 4.6 | 30.19 | 24.74 | 38.44 | 26.37 | 41.3 | 50.28 | 0.0 | LlamaForCausalLM |
| RWKV-4-PilePlus-430M-20230520-6162-1018Gtokens-ctx4098 | Fine Tuned Models | 3.8 | 30.18 | 26.02 | 40.39 | 24.45 | 37.57 | 52.41 | 0.23 | Unknown |
| LiteLlama-460M-1T | Pretrained Models | 4.6 | 30.16 | 24.83 | 38.39 | 25.96 | 41.59 | 50.2 | 0.0 | LlamaForCausalLM |
| flyingllama | Fine Tuned Models | 4.6 | 30.16 | 24.74 | 38.35 | 26.14 | 41.6 | 50.12 | 0.0 | LlamaForCausalLM |
| Orca-2-7b-f16 | Pretrained Models | 70 | 30.15 | 29.61 | 25.62 | 26.7 | 48.36 | 50.59 | 0.0 | LlamaForCausalLM |
| OPT-350M-Erebus | Fine Tuned Models | 3.3 | 30.14 | 23.81 | 34.35 | 26.23 | 43.58 | 52.57 | 0.3 | OPTForCausalLM |
| bloom-1b1-RLHF | Chat Models | 0.2 | 30.14 | 27.99 | 26.19 | 26.86 | 48.88 | 50.91 | 0.0 | Unknown |
| bloom-560m | Unkown Model Types | 5.6 | 30.13 | 24.74 | 37.15 | 24.22 | 42.44 | 51.93 | 0.3 | BloomForCausalLM |
| Llama-2-13b | Fine Tuned Models | 128.5 | 30.11 | 29.35 | 26.35 | 24.94 | 48.32 | 51.7 | 0.0 | Unknown |
| opt350m_10e5 | Fine Tuned Models | 3.3 | 30.09 | 24.15 | 36.53 | 26.0 | 42.17 | 51.7 | 0.0 | OPTForCausalLM |
| test5 | Fine Tuned Models | 128.5 | 30.06 | 28.41 | 26.63 | 25.36 | 47.34 | 52.64 | 0.0 | Unknown |
| lamini-cerebras-1.3b | Fine Tuned Models | 13.2 | 30.05 | 26.88 | 37.96 | 28.43 | 36.45 | 50.59 | 0.0 | Unknown |
| megatron-GPT-2-345m-EvolInstruct | Fine Tuned Models | 3.8 | 30.01 | 24.06 | 35.12 | 24.48 | 41.25 | 54.78 | 0.38 | GPT2LMHeadModel |
| opt-350m | Pretrained Models | 3.5 | 30.01 | 23.55 | 36.73 | 26.02 | 40.83 | 52.64 | 0.3 | OPTForCausalLM |
| mistral7b_sft_dpo | Fine Tuned Models | 72.4 | 30.0 | 27.56 | 25.53 | 24.05 | 49.68 | 53.2 | 0.0 | MistralForCausalLM |
| phi2 | Fine Tuned Models | 13.1 | 29.98 | 22.87 | 30.7 | 27.55 | 46.1 | 52.01 | 0.68 | Unknown |
| speechless-codellama-orca-platypus-13b-0.10e | Fine Tuned Models | 130.2 | 29.96 | 28.92 | 25.76 | 25.28 | 49.22 | 50.59 | 0.0 | LlamaForCausalLM |
| Ziya-LLaMA-13B-Pretrain-v1 | Fine Tuned Models | 130 | 29.96 | 27.99 | 26.0 | 27.04 | 48.59 | 50.12 | 0.0 | LlamaForCausalLM |
| moe-x33 | Fine Tuned Models | 589.4 | 29.95 | 26.19 | 26.44 | 24.93 | 51.14 | 50.99 | 0.0 | MixtralForCausalLM |
| proofGPT-v0.1 | Fine Tuned Models | 0 | 29.94 | 22.87 | 28.66 | 25.96 | 51.64 | 50.43 | 0.08 | GPTNeoXForCausalLM |
| mistral-environment-adapter | Fine Tuned Models | 72.4 | 29.93 | 29.18 | 25.81 | 25.38 | 48.75 | 50.43 | 0.0 | MistralForCausalLM |
| OPT-350M-Nerys-v2 | Fine Tuned Models | 3.5 | 29.9 | 23.63 | 35.49 | 25.91 | 42.08 | 51.62 | 0.68 | OPTForCausalLM |
| gpt2-medium-emailgen | Fine Tuned Models | 3.8 | 29.87 | 26.45 | 34.31 | 24.1 | 43.96 | 50.43 | 0.0 | GPT2LMHeadModel |
| cutie | Chat Models | 72.4 | 29.87 | 26.96 | 27.02 | 24.17 | 48.42 | 52.64 | 0.0 | Unknown |
| test2 | Fine Tuned Models | 128.5 | 29.87 | 29.61 | 26.65 | 24.34 | 48.49 | 50.12 | 0.0 | Unknown |
| WizardLM-7B-uncensored-GPTQ | Unkown Model Types | 90.4 | 29.86 | 28.5 | 25.37 | 24.85 | 50.86 | 49.57 | 0.0 | LlamaForCausalLM |
| speechless-codellama-orca-platypus-13b-0.10e | Chat Models | 130.2 | 29.83 | 28.75 | 25.88 | 25.36 | 49.27 | 49.72 | 0.0 | LlamaForCausalLM |
| Ziya-LLaMA-13B-v1 | Fine Tuned Models | 130 | 29.82 | 27.73 | 25.96 | 27.04 | 48.65 | 49.57 | 0.0 | LlamaForCausalLM |
| WizardLM-33B-V1.0-Uncensored-SuperHOT-8k | Unkown Model Types | 330 | 29.81 | 25.43 | 31.97 | 23.43 | 47.0 | 51.07 | 0.0 | LlamaForCausalLM |
| neuralfalcon-1b-v1 | Fine Tuned Models | 10 | 29.8 | 26.37 | 26.56 | 25.93 | 49.03 | 50.75 | 0.15 | FalconForCausalLM |
| FinOPT-Franklin | Fine Tuned Models | 13.2 | 29.78 | 27.73 | 24.91 | 23.12 | 52.4 | 50.51 | 0.0 | OPTForCausalLM |
| mental-alpaca | Fine Tuned Models | 0 | 29.77 | 28.58 | 26.02 | 27.04 | 48.61 | 48.38 | 0.0 | LlamaForCausalLM |
| clown-SUV-4x70b | Merged Models or MoE Models | 2380.9 | 29.76 | 24.74 | 28.29 | 24.2 | 48.81 | 52.49 | 0.0 | MixtralForCausalLM |
| opt350m_10e6 | Fine Tuned Models | 3.3 | 29.73 | 23.98 | 32.36 | 24.96 | 46.71 | 50.36 | 0.0 | OPTForCausalLM |
| proofGPT-v0.1-6.7B | Fine Tuned Models | 67 | 29.72 | 23.29 | 28.45 | 24.57 | 50.87 | 51.14 | 0.0 | GPTNeoXForCausalLM |
| Llama-68M-Chat-v1 | Chat Models | 0.7 | 29.72 | 23.29 | 28.27 | 25.18 | 47.27 | 54.3 | 0.0 | LlamaForCausalLM |
| neuralfalcon-1b-v1 | Fine Tuned Models | 10 | 29.72 | 26.79 | 26.56 | 26.22 | 48.93 | 49.57 | 0.23 | FalconForCausalLM |
| gpt2-turkish-uncased | Fine Tuned Models | 1.4 | 29.68 | 24.49 | 25.08 | 26.59 | 52.3 | 49.64 | 0.0 | Unknown |
| Llama-2-13b-12_153950 | Fine Tuned Models | 128.5 | 29.68 | 28.58 | 26.58 | 20.79 | 49.03 | 53.12 | 0.0 | Unknown |
| UltraRM-13b | Fine Tuned Models | 128.5 | 29.58 | 28.16 | 26.13 | 25.96 | 47.91 | 49.33 | 0.0 | Unknown |
| gogpt-560m | Fine Tuned Models | 5.6 | 29.56 | 26.37 | 31.86 | 25.29 | 43.12 | 50.75 | 0.0 | BloomForCausalLM |
| pythia-70m-deduped-cleansharegpt | Fine Tuned Models | 0.7 | 29.56 | 25.68 | 25.4 | 23.12 | 51.15 | 52.01 | 0.0 | GPTNeoXForCausalLM |
| xglm-564M | Pretrained Models | 5.6 | 29.55 | 24.57 | 34.64 | 25.18 | 40.43 | 52.25 | 0.23 | XGLMForCausalLM |
| juniper-certificate-Llama-2-7b-chat-hf | Fine Tuned Models | 70 | 29.55 | 29.1 | 27.63 | 24.02 | 48.23 | 48.3 | 0.0 | LlamaForCausalLM |
| santacoder | Fine Tuned Models | 0 | 29.51 | 26.28 | 25.6 | 25.89 | 51.24 | 48.07 | 0.0 | GPT2LMHeadCustomModel |
| bloom-820m-chat | Unkown Model Types | 7.5 | 29.5 | 23.38 | 34.16 | 25.98 | 40.32 | 53.2 | 0.0 | BloomForCausalLM |
| supermario-v1 | Fine Tuned Models | 72.4 | 29.49 | 27.73 | 25.83 | 27.04 | 47.27 | 49.09 | 0.0 | Unknown |
| mistral7b-test001 | Pretrained Models | 75.8 | 29.49 | 24.66 | 26.78 | 23.12 | 50.07 | 52.33 | 0.0 | Unknown |
| airoboros-33b-gpt4-1.2-SuperHOT-8k | Unkown Model Types | 330 | 29.48 | 24.66 | 31.23 | 23.13 | 47.44 | 50.43 | 0.0 | LlamaForCausalLM |
| test1 | Fine Tuned Models | 66.1 | 29.48 | 27.65 | 26.17 | 24.55 | 48.33 | 50.2 | 0.0 | Unknown |
| mistral-7b-dpo-open-orca-flan-50k-synthetic-5-models | Chat Models | 72.4 | 29.48 | 25.51 | 25.52 | 26.82 | 48.81 | 50.2 | 0.0 | MistralForCausalLM |
| gpt-neo-125m | Pretrained Models | 1.5 | 29.47 | 22.95 | 30.26 | 25.97 | 45.58 | 51.78 | 0.3 | GPTNeoForCausalLM |
| KoAlpaca-Polyglot-5.8B | Fine Tuned Models | 60 | 29.46 | 27.65 | 35.58 | 24.72 | 39.74 | 49.01 | 0.08 | GPTNeoXForCausalLM |
| Llama-2-13b-public | Fine Tuned Models | 128.5 | 29.45 | 29.95 | 26.65 | 22.74 | 49.01 | 48.38 | 0.0 | Unknown |
| smol_llama-220M-GQA | Pretrained Models | 2.2 | 29.44 | 24.83 | 29.76 | 25.85 | 44.55 | 50.99 | 0.68 | LlamaForCausalLM |
| lamini-neo-125m | Fine Tuned Models | 1.2 | 29.44 | 24.57 | 30.22 | 26.74 | 42.85 | 52.25 | 0.0 | Unknown |
| tiny_starcoder_py | Pretrained Models | 1.6 | 29.41 | 20.99 | 28.77 | 26.79 | 47.68 | 51.22 | 0.99 | GPTBigCodeForCausalLM |
| Cerebras-GPT-256M | Pretrained Models | 2.6 | 29.38 | 22.01 | 28.99 | 26.83 | 45.98 | 52.49 | 0.0 | ? |
| pythia-160m-deduped | Pretrained Models | 2.1 | 29.38 | 24.06 | 31.39 | 24.86 | 44.34 | 51.38 | 0.23 | GPTNeoXForCausalLM |
| DeciCoder-1b | Pretrained Models | 11.1 | 29.37 | 21.16 | 31.09 | 24.34 | 47.05 | 50.83 | 1.74 | DeciCoderForCausalLM |
| zephyr-smol_llama-100m-dpo-full | Chat Models | 1 | 29.37 | 25.0 | 28.54 | 25.18 | 45.75 | 51.07 | 0.68 | LlamaForCausalLM |
| SmolLlamix-8x101M-take2 | Pretrained Models | 4 | 29.35 | 23.98 | 28.43 | 25.07 | 45.87 | 52.25 | 0.53 | MixtralForCausalLM |
| smol_llama-220M-openhermes | Chat Models | 2.2 | 29.34 | 25.17 | 28.98 | 26.17 | 43.08 | 52.01 | 0.61 | LlamaForCausalLM |
| zephyr-220m-dpo-full | Chat Models | 2.2 | 29.33 | 25.43 | 29.15 | 26.43 | 43.44 | 50.99 | 0.53 | MistralForCausalLM |
| zephyr-220m-sft-full | Chat Models | 2.2 | 29.33 | 25.26 | 29.03 | 26.45 | 43.23 | 51.62 | 0.38 | MistralForCausalLM |
| Aira-2-1B1 | Chat Models | 11 | 29.32 | 23.21 | 26.97 | 24.86 | 50.63 | 50.28 | 0.0 | LlamaForCausalLM |
| test-model | Pretrained Models | 0 | 29.31 | 24.4 | 30.17 | 25.88 | 44.59 | 50.83 | 0.0 | Unknown |
| llama2-13b-platypus-ckpt-1000 | Chat Models | 128.5 | 29.28 | 28.16 | 26.55 | 23.17 | 48.79 | 49.01 | 0.0 | Unknown |
| DialoGPT-large | Fine Tuned Models | 0 | 29.27 | 23.38 | 25.77 | 23.81 | 50.27 | 52.41 | 0.0 | GPT2LMHeadModel |
| changpt-bart | Chat Models | 1.8 | 29.27 | 28.67 | 26.41 | 23.12 | 47.94 | 49.49 | 0.0 | Unknown |
| FinOPT-Lincoln | Fine Tuned Models | 3.3 | 29.27 | 26.71 | 25.6 | 23.0 | 50.59 | 49.72 | 0.0 | OPTForCausalLM |
| WizardLM-13B-1.0 | Fine Tuned Models | 128.5 | 29.27 | 28.5 | 25.97 | 23.12 | 48.61 | 49.41 | 0.0 | Unknown |