加载中...
加载中...
本页面提供当前主流大模型在代码能力上的评测结果,包括HumanEval和MBPP等基准数据集。
Data source: 论文或GitHub评测结果
| Model | Parameters | HumanEval Pass@1 | MBPP Pass@1 | Organization | License |
|---|---|---|---|---|---|
| OpenAI o1-mini | / | 92.40 | / | OpenAI | / |
| Claude 3.5 Sonnet | / | 92 | / | Anthropic | / |
| Llama3.1-405B Instruct | 4050.0 | 89 | 88.60 |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.
| Facebook AI研究实验室 |
| / |
| DeepSeek V2.5 | 2360.0 | 89 | / | DeepSeek-AI | / |
| Amazon Nova Pro | / | 89 | / | 亚马逊 | / |
| Grok 2 | 2690.0 | 88.40 | / | xAI | / |
| Codestral 25.01 | / | 86.60 | 80.20 | MistralAI | / |
| GPT-4 | 1750.0 | 85.40 | 83.50 | OpenAI | / |
| Amazon Nova Lite | / | 85.40 | / | 亚马逊 | / |
| Llama3-400B-Instruct-InTraining | 4000.0 | 84.10 | / | Facebook AI研究实验室 | / |
| DeepSeek-V3 | 6810.0 | 82.60 | / | DeepSeek-AI | / |
| Amazon Nova Micro | / | 81.10 | / | 亚马逊 | / |
| C4AI Command A (202503) | 1110.0 | 80 | / | CohereAI | / |
| Grok-1.5 | / | 74.10 | / | xAI | / |
| DeepSeek-V2-236B-Chat | 2360.0 | 73.80 | 61.40 | DeepSeek-AI | / |
| Qwen2.5-Max | / | 73.20 | 80.60 | 阿里巴巴 | / |
| DBRX Instruct | 1320.0 | 70.10 | / | databricks | / |
| DeepSeek-V3-Base | 6810.0 | 65.20 | 75.40 | DeepSeek-AI | / |
| Grok-1 | 3140.0 | 63.20 | / | xAI | / |
| Qwen1.5-110B | 1100.0 | 52.40 | 58.10 | 阿里巴巴 | / |
| GPT-3.5 | 1750.0 | 48.10 | 52.20 | OpenAI | / |
| Mixtral-8×22B-MoE | 1410.0 | 45.10 | 71.20 | MistralAI | / |
| DeepSeek-V2-236B | 2360.0 | 40.90 | 66.60 | DeepSeek-AI | / |
| PaLM-Coder | 5400.0 | 35.90 | 47 | Google Research | / |
| Codex | 1750.0 | 28.81 | / | OpenAI | / |
| PaLM | 5400.0 | 26.20 | 47 | Google Research | / |