LLM Coding Leaderboard

Name: LLM Coding Leaderboard
Creator: DataLearner
License: https://creativecommons.org/licenses/by/4.0/

This page provides current LLM coding evaluation results, including HumanEval and MBPP Pass@1 scores.

Top Model

Phi-3-mini 3.8B

Top Score

Model Count

Data version

Data source: 论文或GitHub评测结果

Leaderboard snapshot month:

Ranking Table

Model	Parameters	HumanEval Pass@1	MBPP Pass@1	Organization	License
Phi-3-mini 3.8BMicrosoft Azure	38	58.50	70.00	Microsoft Azure	—
Phi-1Microsoft Azure	13	50.60	55.50	Microsoft Azure	—
MiniCPM-2B-DPO面壁智能	24	50.00	47.31	面壁智能	—
Phi-2Microsoft Azure	27	48.30	59.10	Microsoft Azure	—
Qwen2.5-3B阿里巴巴	30	42.10	57.10	阿里巴巴	—
Qwen2.5-1.5B阿里巴巴	15	37.20	60.20	阿里巴巴	—
Stable LM Zephyr 3BStability AI	30	35.37	31.85	Stability AI	—
Phi-1.5Microsoft Azure	13	34.10	37.70	Microsoft Azure	—
Qwen2-1.5B阿里巴巴	15	31.10	37.40	阿里巴巴	—
Qwen2.5-0.5B阿里巴巴	5	30.50	39.30	阿里巴巴	—
Gemma 2BGoogle Research	20	22.00	29.20	Google Research	—
Gemma 2B - ItGoogle Research	20	22.00	29.20	Google Research	—
CodeGemma-2BGoogle Research	20	22.00	29.20	Google Research	—
Qwen2-0.5B阿里巴巴	4	22.00	22.00	阿里巴巴	—
RecurrentGemma-2BGoogle Research	27	21.30	28.80	Google Research	—
Qwen-1.8B阿里巴巴	18	15.20	—	阿里巴巴	—
TinyLlama新加坡科技与设计大学	11	6.71	19.91	新加坡科技与设计大学	—

Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.