The Apache Software Foundation (ASF) has announced version 1 of its open-source cluster-computing framework for Big Data analysis, Apache Spark.
“1.0 is a huge milestone for the fast-growing Spark community,” said Matei Zaharia, vice president of Apache Spark. “Every contributor and user who’s helped bring Spark to this point should feel proud of this release.”
(Related: How Spark got to a top-level Apache project)
Apache Spark is known for its speed and ease of use that enables developers to write apps in Java, Python or Scala quickly, using more than 80 built-in high-level operators. According to the ASF, programs can run up to 100x faster than Apache Hadoop MapReduce in memory with Spark. It is well suited for machine learning, stream processing and interactive queries, and is 100% compatible with Hadoop’s Distributed File System, HBase, Cassandra, and any Hadoop data source.
The release features strong API stability guarantees; a new Spark SQL component to access structured data; a unified submission tool to deploy apps on a local machine, Apache Mesos, YARN, or a dedicated cluster; operational and packaging improvements; enhancements to its machine-learning library, MLlib; GraphX and streaming improvements; and extended Java and Python support.
“Across the board, we’ve focused on building tools to empower the data scientists, statisticians and engineers who must grapple with large data sets every day,” said Patrick Wendell, software engineer at Databricks and Apache Spark 1.0’s release manager.
More information about the release can be found here.