Typesafe, provider of the world’s leading Reactive platform and the company behind Play Framework, Akka, and Scala, today announced it is offering commercial support for a new distribution of Apache Spark designed to run on the Mesosphere Datacenter Operating System (DCOS). With Spark on DCOS and enterprise-grade support options from Typesafe, it has never been easier to run Spark on any modern version of Linux (CentOS, CoreOS, Red Hat and Ubuntu), on any major cloud (AWS, GCE, Azure, Digital Ocean), private datacenter or hybrid cloud.
Spark is a powerful computation engine for Big Data that is becoming a popular replacement for MapReduce – especially for real-time processing scenarios and enterprises focused on so-called “fast data” where speed and performance for analytics matters. Mesosphere’s Datacenter Operating System (DCOS), which is built atop Apache Mesos, is designed to simplify the operational complexities of running distributed systems like Spark across enterprise-scale datacenter and cloud environments.
“It’s widely understood that big data is moving from batch jobs in MapReduce to a wider range of more real-time data processing scenarios, all made possible with Spark – but there is a skills gap within the typical enterprise trying to make the transition,” said Dean Wampler, Architect for Big Data Products and Services at Typesafe. “Like any distributed system, Spark is designed to run across multiple servers, which increases the operational requirements for installation and scaling. We believe that this new distribution built for the Mesosphere DCOS, combined with the enterprise support options provided by Typesafe, provide an accelerated path to getting maximum value out of Spark.”
Spark and Mesos were created together at UC Berkeley’s AMPLab and became a popular pairing as key components in the Berkeley Data Analytics Stack (BDAS) for big data processing. Mesosphere has extended the capabilities of Mesos into a complete operating system, the Mesosphere DCOS, which makes it easy to run Spark alongside other, complementary distributed systems, traditional Java and web applications, as well as stream processing and and datastores, like Kafka, Cassandra, and HDFS. When speed matters in “Fast Data” processing, moving the data analytics of Spark closer to datastores offers major performance advantages. Also, multitenancy of frameworks on the same cluster improves resource utilization in ways that cannot be achieved by human operators using static partitioning approaches.
“Spark on the Mesosphere DCOS allows you to drive up the efficiency of your big data stack, reducing both operational complexity and resource consumption,” said Benjamin Hindman, one of the original co-creators of Apache Mesos and now Chief Architect and Co-Founder at Mesosphere. “Now you can run your microservices along with your analytics and other big data frameworks on the same machines – which allows you to get the speed benefits of data locality, not to mention better overall utilization and operational efficiency.”
Engineers from Mesosphere, the company behind Apache Mesos, and Typesafe have become the community stewards of the Spark-Mesos integration and have contributed to the upstream Apache Spark project in order to help expand its features to make it run better in distributed production environments, using the Mesosphere Datacenter Operating System (DCOS). The new distribution has been named by Databricks as a “Certified Spark Distribution” which means it is compatible with the 100% open source Apache Spark distribution.
The need to process Big Data faster has fueled intense developer interest in Spark as an alternative to MapReduce. Spark only recently (in 2014) became a top level project for Apache, but has achieved rapid adoption. In a survey earlier this year, Typesafe polled more than 2,100 developers globally and found that 13% were already using Spark in production, with another 20% planning production usage in 2015 and another 31% actively evaluating.
“The most exciting technical innovation in big data is taking place in open source and distributed frameworks like Apache Spark, Apache Cassandra, Apache Kafka, Akka and many others” said Patrick Di Loreto, Lead of R&D Engineering at William Hill, the United Kingdom’s highest revenue online gaming company. “We’re seeing the Scala language and the Mesos distributed systems kernel as the common threads between these ‘new stack’ big data frameworks, and the availability of commercial support from Typesafe and Mesosphere is a great safety net for companies like ours that are aggressively adopting these technologies for our real-time data processing and machine learning efforts.”
With Spark on Mesosphere DCOS supported by the Typesafe Together Project Success Subscription program, Spark users have an enterprise-class commercial support option for the entire project lifecycle, and access to the foremost Scala language experts in the world.