The goal of the BigCode initiative is to develop innovative large language learning models (LLM) to develop code in an open and responsible way.
Code LLMs enable code completion and synthesis from other code descriptions and natural language, and enable users to work across a wide range of domains, tasks, and programming languages.
The initiative is led by ServiceNow Research, which researches future AI-powered experiences, and Hugging Face, a community and data platform that provides tools to enable users to build, train, and deploy of ML models based on open-source code and technologies.
BigCode invites AI researchers to collaborate on a representative evaluation suite for code LLMs covering a diverse range of tasks and programming languages, responsible development and management of data sets for LLMs of code, and faster training and inference methods for LLMs.
“BigCode’s first goal is to build and release a data set large enough to train a state-of-the-art language model for code. We will ensure that only files from repositories with permissive licenses end up in the data set,” wrote ServiceNow Research in a blog post.
“Using that data set, we will train a 15-billion-parameter language model for the code using ServiceNow’s in-house GPU cluster. Using an adapted version of Megatron-LM, we will train LLM on distributed infrastructure.”
Additional details about the project are available here.