Researchers from Intel, Massachusetts Institute of Technology and Georgia Institute of Technology have announced a new machine programming system designed to detect code similarity. The Machine Inferred Code Similarity (MISIM) system is an automated engine capable of determining when two pieces of code, data structures of algorithms perform the same or similar tasks.
According to the researchers, hardware and software systems are increasingly becoming more and more complex. That, coupled with the storage of programmers necessary to develop the hardware and software systems have highlighted a need for a new development approach.
The idea of machine programming, which was coined by Intel Labs and MIT, is to improve development productivity through the usage of automated tools.
“Intel’s ultimate goal for machine programming is to democratize the creation of software. When fully realized, MP will enable everyone to create software by expressing their intention in whatever fashion that’s best for them, whether that’s code, natural language or something else. That’s an audacious goal, and while there’s much more work to be done, MISIM is a solid step toward it,” said Josh Gottschlich, principal scientists and director/founder of machine programing research at Intel.
The researchers explained MISIM differs from other code similarity systems because it uses a context-aware semantic structure (CASS) which provides more insight into what code does, not just how it does it. Other code similarly systems try to determine similar characteristics or similar goals while MISIM can determine code that performs similar computations. “This is an important step toward the grander vision of machine programming,” Gottschlich said.
Additionally, MISIM does not require a compiler to translate human-readable source doe to computer-executable machine code. “This has many benefits over existing systems, including the ability to execute on incomplete snippets of code that a developer may be currently writing – an important practical characteristic for recommendation systems or automated bug fixing,” according to the announcement of the system. “Once the code’s structure is integrated into CASS, neural network systems give similarity scores to pieces of code based on the jobs they are designed to carry out. In other words, if two pieces of code look very different in their structure but perform the same function, the neural networks would rate them as largely similar.”
The researchers also state MISIM can identify similar pieces of code 40 times more accurately than prior systems.
Going forward, the researchers plan to expand the solution’s feature set, develop a code recommendation engine, and engage with other software groups to see how MISIM can be integrated into day-to-day development. “I imagine most developers would happily let the machine find and fix bugs for them, if it could – I know I would,” Gottschlich added.