Qubole, the big data-as-a-service company founded by the team the developed Facebook’s data infrastructure, today announced the addition of Apache Spark to the Qubole Data Service (QDS) platform. With the addition of Apache Spark, QDS broadens the types of workloads data scientists and data analysts can run on demand via the cloud, without the hassles, costs and risks of deploying Spark on-premises.
Qubole Data Service (QDS) is a self-service platform for big data analytics that runs on the three major public clouds: Amazon AWS, Google Compute Engine and Microsoft Azure. Among its key advantages, QDS automatically sets up and scales up a cluster to match the needs of the particular job, and then winds down nodes when they’re no longer needed. QDS is a fully managed big data offering that leverages the latest open source technologies, such as Apache Hadoop, Hive, Presto, Pig, Oozie, Sqoop and now Spark, to provide the only comprehensive, “everything as a service” data analytics platform complete with enterprise security features, an easy to use UI and built in data governance.
With the addition of Apache Spark, Qubole customers gain access to the fast in-memory processing capabilities of Spark that make it ideal for machine learning and predictive analytics applications. Data scientists can set up a Spark cluster in QDS in less than 15 minutes directly within the QDS web interface, and like all QDS services, the Spark feature auto-scales based on workload, ensuring the most efficient and cost-effective use of compute resources.
“Many organizations are evaluating Apache Spark for their own big data implementations, but deploying and maintaining Spark clusters can be tricky,” said Joydeep Sen Sarma, co-founder and CTO of Qubole. “By adding Spark to QDS, we’ve completely eliminated the barriers to taking full advantage of Spark for rapid data analytics and we’re giving customers the ability to select the best technology for their big data tasks, on the fly.”
Qubole’s Spark-as-a-Service is truly self-service and makes it easy to set up multiple user accounts and to launch and configure multiple Spark clusters as needed. It uses the familiar Spark notebook style interface, which facilitates collaboration among data scientists, and accepts commands in Scala, Python and R programming languages. QDS provides inline results and template visualizations for Spark queries, eliminating the need to open multiple windows or manage multiple applications. It also runs automatic health checks, alerts users of bad nodes and automates replacement of bad nodes, improving productivity.
Several Qubole customers are already using and testing Spark on QDS, including Pinterest and DataLogix.
“The addition of Apache Spark to QDS makes the platform even more valuable to Pinterest,” said Krishna Gade, engineering manager at Pinterest. “Qubole empowers us to use the latest Big Data tools at petabyte scale without needing to invest in building out, maintaining and updating our own infrastructure. As a result, we can focus on extracting value from our data using the best technologies for the job, and on driving the business forward.”
To learn more about Qubole’s Spark-as-a-Service offering or to get a demo, please visit Qubole in booth #1513 at the Strata + Hadoop World conference February 17 – 20, 2015, in San Jose, Calif.