Researchers have developed a new system designed to tackle complex objects and workflows on Big Data platforms. Computer science researchers from Rice University’s DARPA-funded Pliny project has announced PlinyCompute.
The project is funded through DARPA’s Mining and Understanding Software Enclaves (MUSE) initiative. The Pliny project aims to create systems that automatically detect and fix errors in programs. PlinyCompute is “a system purely for developing high-performance, Big Data codes.”
“With machine learning, and especially deep learning, people have seen what complex analytics algorithms can do when they’re applied to Big Data,” Chris Jermaine, a Rice computer science professor who is leading the platform’s development, said in the announcement. “Everyone, from Fortune 500 executives to neuroscience researchers are clamoring for more and more complex algorithms, but systems programmers have mostly bad options for providing that today. HPC can provide the performance, but it takes years to learn to write code for HPC, and perhaps worse, a tool or library that might take days to create with Spark can take months to program on HPC.”
According to Jermaine, while Spark was developed for Big Data and supports things such as load balancing, fault tolerance and resource allocation, it wasn’t designed for complex computation. “Spark is built on top of the Java Virtual Machine, or JVM, which manages runtimes and abstracts away most of the details regarding memory management,” said Jia Zou, a research scientist at Rice. “Spark’s performance suffers from its reliance on the JVM, especially as computational demands increase for tasks like training deep neural networks for deep learning.”
Zou continued that PlinyCompute was designed for high performance, and has found to be at least twice as fast and 50 times faster at complex computation over Spark. However, PlinyCompute requires programmers to write libraries and models in C++ while Spark requires Java-based coding. Because of this, Jermaine says programmers might find it difficult to write code for PlinyCompute.
“There’s more flexibility with PlinyCompute,” Jermaine said. “That can be a challenge for people who are less experienced and knowledgeable about C++, but we also ran a side-by-side comparison of the number of lines of code that were needed to complete various implementations, and for the most part there was no significant difference between PlinyCompute and Spark.”