Apache Spark is gaining prominence. The Apache Software Foundation (ASF) announced the open-source cluster-computing framework for Big Data analysis has graduated from the Apache Incubator to a top-level project.
Now that Apache Spark is a top-level project, a project management committee will guide the projects day-to-day operations, and Databricks cofounder and VP of Apache Spark Matei Zaharia will be appointed VP of Apache Spark.
“It’s great to see Apache become Spark’s permanent home,” he said. “Spark has quickly become one of the most active projects in the Hadoop ecosystem, with dozens of organizations contributing, and we look forward to working closely with the rest of the Apache community.”
Apache Spark originated in 2009 at the University of California at Berkeley’s Algorithms, Machines and People Lab (AMPLab), and it entered the Apache Incubator in June 2013. It is currently in use at companies like Cloudera, IBM, Intel and Yahoo, among others.
(Related: Tresata builds off of Apache Spark)
“I’m really proud of the community aspect that has become infectious in Apache Spark, and that really grew out of the energy in the project starting in the AMPLab and through its movement to the ASF,” said Chris Mattmann, Apache Spark Incubator mentor at the ASF.
Its integration with Apache Hadoop makes Spark suitable for interactive queries, stream processing and machine learning. It can read from Cassandra, HBase, HDFS or any Hadoop data source.
Spark runs programs 100x faster than Apache Hadoop MapReduce in memory, and it provides APIs that enable developers to rapidly develop applications in Java, Python or Scala, according to the ASF.
Spark can be used standalone, or on EC2, Apache Mesos or Hadoop YARN.
More information about Apache Spark is available here.