Hortonworks, one of the three major Hadoop vendors, announced yesterday that it has been collaborating with HP to improve Apache Spark. The work has already yielded faster sort and in-memory computation for the project, as well as improved performance and usage for scalability.
Hortonworks also announced the inclusion of Apache Kafka and Storm in its Hortonworks DataFlow product. These inclusions allow Hortonworks’ Hadoop platform to better ingest and process streams of data flowing through enterprise data centers.
(Related: Is Spark replacing Hadoop?)
Scott Gnau, CTO of Hortonworks, said, “This collaboration indicates our mutual support of and commitment to the growing Spark community and its solutions. We will continue to focus on the integration of Spark into broad data architectures supported by Apache YARN as well as enhancements for performance and functionality and better access points for applications like Apache Zeppelin.”
Martin Fink, executive vice president and CTO of Hewlett Packard Enterprise (and a member of Hortonworks’ board of directors), said, “We’re hoping to enable the Spark community to derive insight more rapidly from much larger datasets without having to change a single line of code. We’re very pleased to be able to work with Hortonworks to broaden the range of challenges that Spark can address.”
Hortonworks is also planning on solidifying its core Hadoop distribution, according to its announcements made yesterday. Core elements of the platform, such as HDFS, MapReduce, YARN and Apache ZooKeeper, will only see updates once a year from Hortonworks, ensuring stability between point releases.
The extended services within the Hortonworks Data Platform, such as Apache Spark, Hive, HBase, Ambari and others, will see continuous releases throughout the years, enabling the supporting Hadoop projects to mature and grow while the core remains solid for enterprise use.