LLM Coding Leaderboard

Name: LLM Coding Leaderboard
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

This page provides current LLM coding evaluation results, including HumanEval and MBPP Pass@1 scores.

Top Model

Phi 4 - 14B

Top Score

Model Count

Data version

Data source: 论文或GitHub评测结果

Leaderboard snapshot month:

Ranking Table

Model	Parameters	HumanEval Pass@1	MBPP Pass@1	Organization	License
Phi 4 - 14BMicrosoft Azure	140	82.60	—	Microsoft Azure	—
WizardCoder-Python-13B-V1.0WizardLM Team	130	64.00	54.60	WizardLM Team	—
PanGu-Coder2华为	150	61.64	—	华为	—
WizardCoder-15B-V1.0WizardLM Team	150	57.30	—	WizardLM Team	—
Qwen2.5-14B阿里巴巴	140	56.70	76.70	阿里巴巴	—
Moonlight-16B-A3B-InstructMoonshot AI	160	48.10	63.80	Moonshot AI	—
CodeLLaMA-Python-13BFacebook AI研究实验室	130	43.30	49.00	Facebook AI研究实验室	—
CodeLLaMA-Instruct-13BFacebook AI研究实验室	130	42.70	49.40	Facebook AI研究实验室	—
WizardLM-30B-V1WizardLM Team	300	37.80	—	WizardLM Team	—
CodeLLaMA-13BFacebook AI研究实验室	130	36.00	47.00	Facebook AI研究实验室	—
StarCoderBigCode	155	33.60	52.70	BigCode	—
Qwen-14B阿里巴巴	140	32.30	40.80	阿里巴巴	—
StarCodeBaseBigCode	155	30.40	49.00	BigCode	—
CodeGeeX智谱AI	130	22.90	—	智谱AI	—
LLaMA2 13BFacebook AI研究实验室	130	20.10	27.60	Facebook AI研究实验室	—
Baichuan2-13B-Base百川智能	130	17.07	30.20	百川智能	—
Baichuan 13B - Base百川智能	130	11.59	22.90	百川智能	—

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.