Hugging Face and ServiceNow launch BigCode, a project to open source code-generating AI systems

Code-generating systems like DeepMind’s AlphaCode, Amazon’s CodeWhisperer and OpenAI’s Codex, which power GitHub’s Copilot service, provide a tantalizing look at what AI is possible for today in the field of computer programming. But so far, only a handful of such AI systems have been made freely available to the public and open sourced — reflecting the commercial incentives of the companies building them.

In a bid to change that, AI startup Hugging Face and ServiceNow Research, ServiceNow’s R&D division, today launched BigCode, a new project that aims to build “innovative” AI systems for code in an “open and responsible ” way. The goal is to eventually produce a dataset large enough to train a code-generating system, which will then be used to create a prototype — a 15-billion-parameter model, larger than Codex ( 12 billion parameters) but smaller than AlphaCode (~41.4 billion parameters) — using ServiceNow’s in-house graphics card cluster. In machine learning, parameters are the components of an AI system that are learned from historical training data and essentially determine the system’s ability on a problem, such as code generation.

Inspired by Hugging Face’s BigScience effort to open source a highly sophisticated text-generating system, BigCode will be open to anyone who has a professional background in AI research and can commit time to the project, organizers said. The application form went live this afternoon.

“In general, we expect applicants to be affiliated with a research organization (in academia or industry) and to work on technical/ethical/legal aspects of [large language models] for coding applications,” ServiceNow wrote in a blog post. “Once the [code-generating system] is trained, we will evaluate its capabilities … We will try to make the evaluation easier and more extensive so that we can learn more about [system’s] abilities.”

By co-developing a code-generating system, which will be open sourced under a license that will allow developers to reuse it under certain terms and conditions, BigCode seeks to address some of the controversy has arisen over the practice of AI-powered code generation — particularly around fair use. The nonprofit Software Freedom Conservancy has criticized GitHub and OpenAI for using public source code, not all of which is under a permissive license, to train and monetize Codex. Codex is available through OpenAI’s paid API, while GitHub recently started charging for access to Copilot. For their parts, GitHub and OpenAI continue to insist that Codex and Copilot do not comply with any license terms.

The organizers of BigCode say that they will try to ensure that only files from repositories with permissive licenses end up in the aforementioned training dataset. At the same time, they say, they will work to establish “responsible” AI practices for training and sharing code-generating systems of all kinds, soliciting feedback from relevant stakeholders before making announcements. policy.

ServiceNow and Hugging Face have not given a timeline for when the project might be completed. But they expect it to explore some form of code generation in the next few months, including systems that automatically complete and synthesize code from code snippets and natural language descriptions and operate at a wide range of domains, tasks and programming languages.

Assuming the ethical, technical and legal issues are someday ironed out, AI-powered coding tools could significantly reduce development costs while allowing coders to focus on more creative work. According to a study from the University of Cambridge, at least half of developers’ efforts are spent on debugging and idle programming, costing the software industry an estimated $312 billion per year.

#Hugging #Face #ServiceNow #launch #BigCode #project #open #source #codegenerating #systems #Source Link #Hugging Face and ServiceNow launch BigCode, a project to open source code-generating AI systems

Leave a Comment