StarCodeBase
StarCodeBase is an AI model published by BigCode, released on 2023-05-04, for 编程大模型, with 155.0B parameters, and 2K tokens context length, requiring about 64 storage, under the BigCode OpenRAIL-M v1 license.
Data sourced primarily from official releases (GitHub, Hugging Face, papers), then benchmark leaderboards, then third-party evaluators. Learn about our data methodology
StarCodeBase与StarCode一样,都是来自BigCode的开源编程大模型。二者都是GPT-2的架构,唯一的区别是StarCodeBase是在80多种编程语言上训练的,基于1万亿tokens的数据集训练。而StarCode则是前面基础上,继续在350亿的python tokens上训练。
欢迎关注 DataLearner 官方微信,获得最新 AI 技术推送
