The Apache Software Foundation has announced version three of the open source software framework for distributed computing. Apache Hadoop 3.0 is the first major release since Hadoop 2 was released in 2013.
“Hadoop 3 is a major milestone for the project, and our biggest release ever,” said Andrew Wang, Apache Hadoop 3 release manager. “It represents the combined efforts of hundreds of contributors over the five years since Hadoop 2. I’m looking forward to how our users will benefit from new features in the release that improve the efficiency, scalability, and reliability of the platform.”
Apache Hadoop has become known for its ability to run and manage data applications on large hardware clusters in the Big Data ecosystem. The latest release features HDFS erasure coding, a preview of YARN Timeline Service version 2, YARN resource types, and improved capabilities and performance enhancements around cloud storage systems. It includes Hadoop Common for supporting other Hadoop modules, the Hadoop Distributed File System, Hadoop YARN and Hadoop MapReduce.
“This latest release unlocks several years of development from the Apache community,” said Chris Douglas, vice president of Apache Hadoop. “The platform continues to evolve with hardware trends and to accommodate new workloads beyond batch analytics, particularly real-time queries and long-running services. At the same time, our Open Source contributors have adapted Apache Hadoop to a wide range of deployment environments, including the Cloud.”
Apache Hadoop is widely deployed in enterprises and companies like Adobe, AWS, Apple, Cloudera, eBay, Facebook, Google, Hortonworks, IBM, Intel, LinkedIn, Microsoft, Netflix and Teradata. In addition, it has inspired other Hadoop related projects such as: Apache Cassandra, HBase, Hive, Spark and ZooKeeper.