LLM Coding Leaderboard
This page provides current LLM coding evaluation results, including HumanEval and MBPP Pass@1 scores.
Top Model
OpenAI o1-mini
Top Score
-
Model Count
26
Data version
-
Data source: 论文或GitHub评测结果
Ranking Table
| Model | Parameters | HumanEval Pass@1 | MBPP Pass@1 | Organization | License |
|---|---|---|---|---|---|
OpenAI o1-miniOpenAI | — | 92.40 | — | OpenAI | — |
Claude 3.5 SonnetAnthropic | — | 92.00 | — | Anthropic | — |
Llama3.1-405B InstructFacebook AI研究实验室 | 4,050 | 89.00 | 88.60 | Facebook AI研究实验室 | — |
DeepSeek V2.5DeepSeek-AI | 2,360 | 89.00 | — | DeepSeek-AI | — |
Amazon Nova Pro亚马逊 | — | 89.00 | — | 亚马逊 | — |
| 2,690 | 88.40 | — | xAI | — | |
Codestral 25.01MistralAI | — | 86.60 | 80.20 | MistralAI | — |
GPT-4OpenAI | 1,750 | 85.40 | 83.50 | OpenAI | — |
Amazon Nova Lite亚马逊 | — | 85.40 | — | 亚马逊 | — |
Llama3-400B-Instruct-InTrainingFacebook AI研究实验室 | 4,000 | 84.10 | — | Facebook AI研究实验室 | — |
DeepSeek-V3DeepSeek-AI | 6,810 | 82.60 | — | DeepSeek-AI | — |
Amazon Nova Micro亚马逊 | — | 81.10 | — | 亚马逊 | — |
C4AI Command A (202503)CohereAI | 1,110 | 80.00 | — | CohereAI | — |
| — | 74.10 | — | xAI | — | |
DeepSeek-V2-236B-ChatDeepSeek-AI | 2,360 | 73.80 | 61.40 | DeepSeek-AI | — |
Qwen2.5-Max阿里巴巴 | — | 73.20 | 80.60 | 阿里巴巴 | — |
DBRX Instructdatabricks | 1,320 | 70.10 | — | databricks | — |
DeepSeek-V3-BaseDeepSeek-AI | 6,810 | 65.20 | 75.40 | DeepSeek-AI | — |
| 3,140 | 63.20 | — | xAI | — | |
Qwen1.5-110B阿里巴巴 | 1,100 | 52.40 | 58.10 | 阿里巴巴 | — |
GPT-3.5OpenAI | 1,750 | 48.10 | 52.20 | OpenAI | — |
Mixtral-8×22B-MoEMistralAI | 1,410 | 45.10 | 71.20 | MistralAI | — |
DeepSeek-V2-236BDeepSeek-AI | 2,360 | 40.90 | 66.60 | DeepSeek-AI | — |
PaLM-CoderGoogle Research | 5,400 | 35.90 | 47.00 | Google Research | — |
CodexOpenAI | 1,750 | 28.81 | — | OpenAI | — |
PaLMGoogle Research | 5,400 | 26.20 | 47.00 | Google Research | — |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.









