大模型编程能力评测排行榜

Name: 大模型编程能力评测排行榜
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

本页面提供当前主流大模型在代码能力上的评测结果，包括HumanEval和MBPP等基准数据集。

榜首模型

Qwen2.5-Omni-7B

最高得分

模型数量

数据版本

数据来源: 论文或GitHub评测结果

按参数规模筛选:全部 3B及以下 7B 13B 34B 65B 100B及以上

来源：全部国产模型

榜单历史快照月份:

排名总表

模型名称	参数大小	HumanEval Pass@1	MBPP Pass@1	发布者	开源情况
Qwen2.5-Omni-7B阿里巴巴	70	84.80	79.20	阿里巴巴	—
CodeQwen1.5-7B-Chat阿里巴巴	70	83.50	77.70	阿里巴巴	—
Llama3.1-8B-InstructFacebook AI研究实验室	80	72.60	72.80	Facebook AI研究实验室	—
GLM-4-9B-Chat智谱AI	90	71.80	—	智谱AI	—
GLM-4-9B智谱AI	90	70.10	—	智谱AI	—
DeepSeek Coder-6.7B InstructDeepSeek-AI	67	66.10	65.40	DeepSeek-AI	—
Llama3-8BFacebook AI研究实验室	80	62.20	—	Facebook AI研究实验室	—
Llama3-8B-InstructFacebook AI研究实验室	80	62.20	—	Facebook AI研究实验室	—
Phi-3-small 7BMicrosoft Azure	70	59.10	71.40	Microsoft Azure	—
Qwen2.5-7B阿里巴巴	70	57.90	74.90	阿里巴巴	—
CodeGemma-7B-ITGoogle Research	70	56.10	54.20	Google Research	—
CodeQwen1.5-7B阿里巴巴	70	51.80	72.20	阿里巴巴	—
Qwen2-7B阿里巴巴	70	51.20	65.90	阿里巴巴	—
CodeGemma-7BGoogle Research	70	44.50	56.20	Google Research	—
Gemma 2 - 9BGoogle Research	90	40.20	52.40	Google Research	—
CodeLLaMA-Python-7BFacebook AI研究实验室	70	38.40	47.60	Facebook AI研究实验室	—
PaLM2-SGoogle Research	0	37.60	50.00	Google Research	闭源
CodeGeeX2-6B智谱AI	60	35.90	—	智谱AI	收费商用
CodeLLaMA-Instruct-7BFacebook AI研究实验室	70	34.80	44.40	Facebook AI研究实验室	—
WizardCoder-3B-V1.0WizardLM Team	30	34.80	37.40	WizardLM Team	—
CodeLLaMA-7BFacebook AI研究实验室	70	33.50	41.40	Facebook AI研究实验室	—
Gemma 7BGoogle Research	70	32.30	44.40	Google Research	—
Mistral 7BMistralAI	73	30.50	47.50	MistralAI	—
Qwen-7B阿里巴巴	70	29.90	31.60	阿里巴巴	—
AquilaCode-7B-py北京智源人工智能研究院	70	28.80	—	北京智源人工智能研究院	—
WizardCoder-1B-V1.0WizardLM Team	10	23.80	28.60	WizardLM Team	—
AquilaCode-7B-multi北京智源人工智能研究院	70	22.00	—	北京智源人工智能研究院	—
Baichuan2-7B-Base百川智能	70	18.29	24.20	百川智能	—
LLaMA2 7BFacebook AI研究实验室	70	12.20	20.80	Facebook AI研究实验室	—
Baichuan 7B百川智能	70	9.20	6.60	百川智能	—

数据仅供参考，以官方来源为准。模型名称旁的链接可跳转到 DataLearner 模型详情页。