Companies like Amazon, Baidu, IBM and Microsoft were all at Spark Summit this week to discuss how they used the engine. Some of those companies even had new Spark-based products to show off.
IBM, for example, introduced a data-centric development environment, which will be able to support machine learning with the R language thanks to IBM’s expanded support for it within its SystemML. The company is calling its new environment the Data Science Experience, and within it are the tools and data handlers needed to work with data from H2O Libraries, Python and RStudio. The suite also includes connectors to various data sources, allowing developers to build with multiple types and streams of data within Spark.
(Related: MapR has an enterprise-grade Spark distribution)
Microsoft already announced its plans to work with Spark last year when it announced HDInsights would be the basis of an Azure-based Spark offering. [
Microsoft’s offering is focused on Python and R—and specifically IPython, an interactive command shell for writing code. IPython isn’t the main focus for Microsoft, but it is the enabler of Jupyter, a notebook-based, open-source, interactive data science and scientific computing web application.
Jupyter will form the interactive base of developers using Azure and Spark together. Using it, developers can test out their analytics code and see interactive results quickly, rather than waiting for lengthy compile and run times to see what secrets the data may hold.
Connecting data stores to Spark was another theme at the show. Cloud data warehousing company Snowflake, for instance, announced a connector from its platform to Spark. And Couchbase announced the availability of a new Spark Connector, which can be used to bring data from a Couchbase database into Spark for analysis work.