Project CodeNet

In an effort to make code easier to debug, maintain and update, IBM has unveiled Project CodeNet, an open-source dataset for advancing AI’s understanding and translation of code. The project was announced at this week’s Think conference as a part of IBM’s AI for Code initiative, which aims to help developers improve productivity by automating more of their engineering process. 

“We find ourselves in a new age where it’s essential to take advantage of today’s powerful technologies like artificial intelligence (AI) and hybrid cloud to create new solutions that can modernize processes across the information technologies (IT) pipeline,” Richir Puri, chief scientist at IBM Research, wrote in a blog post. “Project CodeNet specifically can drive algorithmic innovation to extract this context with sequence-to-sequence models, just like what we have applied in human languages, to make a more significant dent in machine understanding of code as opposed to machine processing of code.”

Project CodeNet includes 14 million code samples and 500 million lines of code from programming languages like C++, Java, Python, Go, COBOL, Pascal and FORTRAN. The project also includes high-quality metadata and annotation, and sample input and output to help researchers program intent when translating one programming language to another. 

“Our team is excited to give researchers and developers a dataset and a set of technologies that is easy to use and understand, while simultaneously assisting in the development of algorithms that will advance AI for code. With Project CodeNet, we hope to produce lasting business value as enterprises embark on their IT modernization journeys,” Puri wrote. 

Potential use cases to come from the project include code search and cloud detection, automatic code correction, regression studies and prediction.

The company also announced a new milestone with Cloud Pak for Data; new interactive AI capabilities in Watson to increase personal productivity of business professionals; the easy-to-deploy mobile platform Maximo Mobile; Mono2Micro for optimizing and modernizing apps for the hybrid cloud; and improvements to Qiskit Runtime. 

“We will look back on this year and last as the moment the world entered the digital century in full force,” said IBM Chairman and CEO Arvind Krishna. “In the same way that we electrified factories and machines in the past century, we will use hybrid cloud to infuse AI into software and systems in the 21st century. And one thing is certain: this is a future that must be built on a foundation of deep industry collaboration. No one understands this better than IBM, which is one of the reasons we are boosting investment in our partner ecosystem. Also at Think 2021, we are unveiling our latest hybrid cloud and AI innovations – the very technologies that serve as the building blocks of a new IT architecture for business.”