加载中...

大模型编程能力评测排行榜

Name: 大模型编程能力评测排行榜
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

本页面提供当前主流大模型在代码能力上的评测结果，包括HumanEval和MBPP等基准数据集。

数据来源: 论文或GitHub评测结果

模型名称	参数大小	HumanEval Pass@1	MBPP Pass@1	发布者	开源情况
Phi-3-mini 3.8B	38.0	58.50	70	Microsoft Azure	/
Phi-1	13.0	50.60	55.50	Microsoft Azure	/
MiniCPM-2B-DPO	24.0	50	47.31	面壁智能	/
Phi-2	27.0	48.30	59.10	Microsoft Azure	/
Qwen2.5-3B	30.0	42.10	57.10	阿里巴巴	/
Qwen2.5-1.5B	15.0	37.20	60.20	阿里巴巴	/
Stable LM Zephyr 3B	30.0	35.37	31.85	Stability AI	/
Phi-1.5	13.0	34.10	37.70	Microsoft Azure	/
Qwen2-1.5B	15.0	31.10	37.40	阿里巴巴	/
Qwen2.5-0.5B	5.0	30.50	39.30	阿里巴巴	/
Gemma 2B	20.0	22	29.20	Google Research	/
Gemma 2B - It	20.0	22	29.20	Google Research	/
CodeGemma-2B	20.0	22	29.20	Google Research	/
Qwen2-0.5B	4.0	22	22	阿里巴巴	/
RecurrentGemma-2B	27.0	21.30	28.80	Google Research	/
Qwen-1.8B	18.0	15.20	/	阿里巴巴	/
TinyLlama	11.0	6.71	19.91	新加坡科技与设计大学	/

数据仅供参考，以官方来源为准。模型名称旁的链接可跳转到 DataLearner 模型详情页。

加载中...