加载中...
加载中...
本页面提供当前主流大模型在代码能力上的评测结果,包括HumanEval和MBPP等基准数据集。
Data source: 论文或GitHub评测结果
| Model | Parameters | HumanEval Pass@1 | MBPP Pass@1 | Organization | License |
|---|---|---|---|---|---|
| Claude 3.5 Sonnet New | 0.0 | 93.70 | / | Anthropic | / |
| Qwen2.5-Coder-32B-Instruct | 320.0 | 92.70 | 90.20 | 阿里巴巴 | / |
| OpenAI o1-mini | / | 92.40 | / | OpenAI | / |
| Claude 3.5 Sonnet | / | 92 | / | Anthropic | / |
| GPT-4o | 0.0 | 90.20 | / | OpenAI | / |
| Llama3.1-405B Instruct | 4050.0 | 89 | 88.60 | Facebook AI研究实验室 | / |
| DeepSeek V2.5 | 2360.0 | 89 | / | DeepSeek-AI | / |
| Amazon Nova Pro | / | 89 | / | 亚马逊 | / |
| Llama3.3-70B-Instruct | 700.0 | 88.40 | 87.60 | Facebook AI研究实验室 | / |
| Grok 2 | 2690.0 | 88.40 | / | xAI | / |
| Claude 3.5 Haiku | 0.0 | 88.10 | / | Anthropic | / |
| GPT-4o mini | 0.0 | 87.20 | / | OpenAI | / |
| Codestral 25.01 | / | 86.60 | 80.20 | MistralAI | / |
| Qwen2-72B-Instruct | 720.0 | 86 | 80.20 | 阿里巴巴 | / |
| GPT-4 | 1750.0 | 85.40 | 83.50 | OpenAI | / |
| Amazon Nova Lite | / | 85.40 | / | 亚马逊 | / |
| Claude3-Opus | 0.0 | 84.90 | / | Anthropic | / |
| Mistral Small 24B Instruct 2501 | 240.0 | 84.80 | / | MistralAI | / |
| Qwen2.5-Omni-7B | 70.0 | 84.80 | 79.20 | 阿里巴巴 | / |
| Llama3-400B-Instruct-InTraining | 4000.0 | 84.10 | / | Facebook AI研究实验室 | / |
| CodeQwen1.5-7B-Chat | 70.0 | 83.50 | 77.70 | 阿里巴巴 | / |
| Phi 4 - 14B | 140.0 | 82.60 | / | Microsoft Azure | / |
| DeepSeek-V3 | 6810.0 | 82.60 | / | DeepSeek-AI | / |
| Llama3-70B | 700.0 | 81.70 | / | Facebook AI研究实验室 | / |
| Llama3-70B-Instruct | 700.0 | 81.70 | / | Facebook AI研究实验室 | / |
| Amazon Nova Micro | / | 81.10 | / | 亚马逊 | / |
| Llama3.1-70B-Instruct | 700.0 | 80.50 | 86 | Facebook AI研究实验室 | / |
| C4AI Command A (202503) | 1110.0 | 80 | / | CohereAI | / |
| DeepSeek Coder-33B Instruct | 330.0 | 79.30 | 70 | DeepSeek-AI | / |
| Claude3-Haiku | 0.0 | 75.90 | / | Anthropic | / |
| Gemini-ultra | 0.0 | 74.40 | / | DeepMind | / |
| Grok-1.5 | / | 74.10 | / | xAI | / |
| DeepSeek-V2-236B-Chat | 2360.0 | 73.80 | 61.40 | DeepSeek-AI | / |
| WizardCoder-Python-34B | 340.0 | 73.20 | / | WizardLM Team | / |
| Qwen2.5-Max | / | 73.20 | 80.60 | 阿里巴巴 | / |
| Claude3-Sonnet | 0.0 | 73 | / | Anthropic | / |
| Llama3.1-8B-Instruct | 80.0 | 72.60 | 72.80 | Facebook AI研究实验室 | / |
| GLM4 | 0.0 | 72 | / | 智谱AI | / |
| Gemini 1.5 Pro | 0.0 | 71.90 | / | Google Deep Mind | / |
| GLM-4-9B-Chat | 90.0 | 71.80 | / | 智谱AI | / |
| DBRX Instruct | 1320.0 | 70.10 | / | databricks | / |
| GLM-4-9B | 90.0 | 70.10 | / | 智谱AI | / |
| Phind-CodeLlama-34B-Python-v1 | 340.0 | 69.50 | / | Phind | / |
| Gemini-pro | 1000.0 | 67.70 | / | DeepMind | / |
| Phind-CodeLlama-34B-v1 | 340.0 | 67.60 | / | Phind | / |
| DeepSeek Coder-6.7B Instruct | 67.0 | 66.10 | 65.40 | DeepSeek-AI | / |
| DeepSeek-V3-Base | 6810.0 | 65.20 | 75.40 | DeepSeek-AI | / |
| Qwen2-72B | 727.0 | 64.60 | 76.90 | 阿里巴巴 | / |
| WizardCoder-Python-13B-V1.0 | 130.0 | 64 | 54.60 | WizardLM Team | / |
| Grok-1 | 3140.0 | 63.20 | / | xAI | / |
| Llama3-8B | 80.0 | 62.20 | / | Facebook AI研究实验室 | / |
| Llama3-8B-Instruct | 80.0 | 62.20 | / | Facebook AI研究实验室 | / |
| PanGu-Coder2 | 150.0 | 61.64 | / | 华为 | / |
| Codestral | 220.0 | 61.50 | 78.20 | MistralAI | / |
| Phi-3-small 7B | 70.0 | 59.10 | 71.40 | Microsoft Azure | / |
| Qwen2.5-72B | 727.0 | 59.10 | 84.70 | 阿里巴巴 | / |
| Phi-3-mini 3.8B | 38.0 | 58.50 | 70 | Microsoft Azure | / |
| Qwen2.5-32B | 320.0 | 58.50 | 84.50 | 阿里巴巴 | / |
| Qwen2.5-7B | 70.0 | 57.90 | 74.90 | 阿里巴巴 | / |
| WizardCoder-15B-V1.0 | 150.0 | 57.30 | / | WizardLM Team | / |
| Qwen2.5-14B | 140.0 | 56.70 | 76.70 | 阿里巴巴 | / |
| CodeGemma-7B-IT | 70.0 | 56.10 | 54.20 | Google Research | / |
| Phi-3-medium 14B-preview | 140.0 | 55.50 | 74.40 | Microsoft Azure | / |
| MiniCPM-MoE-8x2B | 136.0 | 55.49 | 41.68 | OpenBMB | / |
| CodeLLaMA-Python-34B | 340.0 | 53.70 | 56.20 | Facebook AI研究实验室 | / |
| YAYI2-30B | 300.0 | 53.10 | 45.80 | 中科闻歌 | / |
| Qwen2-57B-A14B | 570.0 | 53 | 71.90 | 阿里巴巴 | / |
| Qwen1.5-110B | 1100.0 | 52.40 | 58.10 | 阿里巴巴 | / |
| CodeQwen1.5-7B | 70.0 | 51.80 | 72.20 | 阿里巴巴 | / |
| Qwen2-7B | 70.0 | 51.20 | 65.90 | 阿里巴巴 | / |
| Phi-1 | 13.0 | 50.60 | 55.50 | Microsoft Azure | / |
| MiniCPM-2B-DPO | 24.0 | 50 | 47.31 | 面壁智能 | / |
| CodeLLaMA-34B | 340.0 | 48.80 | 55 | Facebook AI研究实验室 | / |
| Phi-2 | 27.0 | 48.30 | 59.10 | Microsoft Azure | / |
| GPT-3.5 | 1750.0 | 48.10 | 52.20 | OpenAI | / |
| Moonlight-16B-A3B-Instruct | 160.0 | 48.10 | 63.80 | Moonshot AI | / |
| Yi-1.5-34B | 340.0 | 46.30 | 65.50 | 零一万物 | / |
| Mixtral-8×22B-MoE | 1410.0 | 45.10 | 71.20 | MistralAI | / |
| CodeGemma-7B | 70.0 | 44.50 | 56.20 | Google Research | / |
| CodeLLaMA-Python-13B | 130.0 | 43.30 | 49 | Facebook AI研究实验室 | / |
| CodeLLaMA-Instruct-13B | 130.0 | 42.70 | 49.40 | Facebook AI研究实验室 | / |
| Qwen2.5-3B | 30.0 | 42.10 | 57.10 | 阿里巴巴 | / |
| CodeLLaMA-Instruct-34B | 340.0 | 41.50 | 57 | Facebook AI研究实验室 | / |
| Qwen1.5-72B-Chat | 720.0 | 41.50 | 53.40 | 阿里巴巴 | / |
| Yi-1.5-9B | 90.0 | 41.40 | 61.10 | 零一万物 | / |
| DeepSeek-V2-236B | 2360.0 | 40.90 | 66.60 | DeepSeek-AI | / |
| Mixtral-8×7B-MoE | 450.0 | 40.20 | 60.70 | MistralAI | / |
| Gemma 2 - 9B | 90.0 | 40.20 | 52.40 | Google Research | / |
| Grok-0 | 330.0 | 39.70 | / | xAI | / |
| Yi-9B | 90.0 | 39 | 54.40 | 零一万物 | / |
| CodeLLaMA-Python-7B | 70.0 | 38.40 | 47.60 | Facebook AI研究实验室 | / |
| WizardLM-30B-V1 | 300.0 | 37.80 | / | WizardLM Team | / |
| PaLM2-S | 0.0 | 37.60 | 50 | Google Research | / |
| Qwen1.5-32B | 320.0 | 37.20 | 49.40 | 阿里巴巴 | / |
| Qwen2.5-1.5B | 15.0 | 37.20 | 60.20 | 阿里巴巴 | / |
| CodeLLaMA-13B | 130.0 | 36 | 47 | Facebook AI研究实验室 | / |
| CodeGeeX2-6B | 60.0 | 35.90 | / | 智谱AI | / |
| PaLM-Coder | 5400.0 | 35.90 | 47 | Google Research | / |
| Aquila2-34B | 340.0 | 35.40 | / | 北京智源人工智能研究院 | / |
| Qwen-72B | 720.0 | 35.40 | 52.20 | 阿里巴巴 | / |
| Stable LM Zephyr 3B | 30.0 | 35.37 | 31.85 | Stability AI | / |
| CodeLLaMA-Instruct-7B | 70.0 | 34.80 | 44.40 | Facebook AI研究实验室 | / |
| WizardCoder-3B-V1.0 | 30.0 | 34.80 | 37.40 | WizardLM Team | / |
| Qwen1.5-MoE-A2.7B | 143.0 | 34.20 | / | 阿里巴巴 | / |
| Phi-1.5 | 13.0 | 34.10 | 37.70 | Microsoft Azure | / |
| StarCoder | 155.0 | 33.60 | 52.70 | BigCode | / |
| CodeLLaMA-7B | 70.0 | 33.50 | 41.40 | Facebook AI研究实验室 | / |
| Qwen-14B | 140.0 | 32.30 | 40.80 | 阿里巴巴 | / |
| Gemma 7B | 70.0 | 32.30 | 44.40 | Google Research | / |
| Qwen2-1.5B | 15.0 | 31.10 | 37.40 | 阿里巴巴 | / |
| LLaMA2 70B | 700.0 | 30.50 | 45.40 | Facebook AI研究实验室 | / |
| Mistral 7B | 73.0 | 30.50 | 47.50 | MistralAI | / |
| Qwen2.5-0.5B | 5.0 | 30.50 | 39.30 | 阿里巴巴 | / |
| StarCodeBase | 155.0 | 30.40 | 49 | BigCode | / |
| Qwen-7B | 70.0 | 29.90 | 31.60 | 阿里巴巴 | / |
| XVERSE-MoE-A4.2B | 258.0 | 29.90 | / | 元象XVERSE | / |
| Codex | 1750.0 | 28.81 | / | OpenAI | / |
| AquilaCode-7B-py | 70.0 | 28.80 | / | 北京智源人工智能研究院 | / |
| XVERSE-65B | 650.0 | 26.80 | / | 元象XVERSE | / |
| PaLM | 5400.0 | 26.20 | 47 | Google Research | / |
| WizardCoder-1B-V1.0 | 10.0 | 23.80 | 28.60 | WizardLM Team | / |
| CodeGeeX | 130.0 | 22.90 | / | 智谱AI | / |
| LLaMA2 34B | 340.0 | 22.60 | 33.80 | Facebook AI研究实验室 | / |
| AquilaCode-7B-multi | 70.0 | 22 | / | 北京智源人工智能研究院 | / |
| Gemma 2B | 20.0 | 22 | 29.20 | Google Research | / |
| Gemma 2B - It | 20.0 | 22 | 29.20 | Google Research | / |
| CodeGemma-2B | 20.0 | 22 | 29.20 | Google Research | / |
| Qwen2-0.5B | 4.0 | 22 | 22 | 阿里巴巴 | / |
| RecurrentGemma-2B | 27.0 | 21.30 | 28.80 | Google Research | / |
| LLaMA2 13B | 130.0 | 20.10 | 27.60 | Facebook AI研究实验室 | / |
| Baichuan2-7B-Base | 70.0 | 18.29 | 24.20 | 百川智能 | / |
| Baichuan2-13B-Base | 130.0 | 17.07 | 30.20 | 百川智能 | / |
| Qwen-1.8B | 18.0 | 15.20 | / | 阿里巴巴 | / |
| LLaMA2 7B | 70.0 | 12.20 | 20.80 | Facebook AI研究实验室 | / |
| Baichuan 13B - Base | 130.0 | 11.59 | 22.90 | 百川智能 | / |
| Baichuan 7B | 70.0 | 9.20 | 6.60 | 百川智能 | / |
| TinyLlama | 11.0 | 6.71 | 19.91 | 新加坡科技与设计大学 | / |
| Mistral Large | 0.0 | 4.10 | 7.10 | MistralAI | / |
| Mistral Small 24B Base2501 | 240.0 | / | 69.64 | MistralAI | / |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.