The Spark Summit kicked off this week in San Francisco with companies releasing new solutions designed to make it easier to work with Apache Spark.
Databricks, the creator of Apache Spark, announced the general availability of the Databricks Community Edition (DCE). DCE is a free version of its data platform built on top of Apache Spark, and is designed to give all users the ability to learn Apache Spark.
“This year we’ve seen explosive growth for the Apache Spark project, and all signs indicate the pace will only accelerate as the community expands even more,” said Matei Zaharia, cofounder and CTO of Databricks. “Databricks Community Edition has created an ideal environment for learning Apache Spark. Developers of all backgrounds can now use Databricks Community Edition to learn Spark and mitigate the acute Spark skills gap.”
(Related: Reynold Xin talks Spark 2.0 and more)
The Community Edition features a 6GB micro-cluster, interactive notebooks and dashboards, online learning resources, and a public environment for users to share their work.
Splice Machine now open source
Splice Machine revealed it is moving its relational database-management system to open source. As part of its move, the company is working with contributors and thought leaders to provide best practices and help develop next-generation features for the open-source community.
“We are very excited to make the transition to open source and build a larger community around Splice Machine,” said Monte Zweben, cofounder and CEO, Splice Machine. “Our whole team is eagerly anticipating the contributions that going open source can enable. We also look forward to being more active within the open-source communities beyond our participation around HBase and Spark.”
The company will continue to maintain the database and add new features.
Hortonworks previews Spark-HBase Connector
Hortonworks gave attendees a taste of its new library designed to support Spark and access HBase as an external data source. The company announced the technical preview of Spark-HBase Connector, which was developed in collaboration with Bloomberg.
“The Spark-HBase connector leverages Data Source API (SPARK-3247) introduced in Spark-1.2.0,” the company wrote on its blog. “It bridges the gap between the simple HBase Key Value store and complex relational SQL queries and enables users to perform complex data analytics on top of HBase using Spark. An HBase DataFrame is a standard Spark DataFrame, and is able to interact with any other data sources such as Hive, ORC, Parquet, JSON, etc.”
The company plans to make the connecter easier to work with in upcoming versions.
Teradata releases Aster Connector for Spark
Teradata announced it is integrating Apache Spark analytics with Teradata Aster Analytics through the Aster Connector for Spark. The connector executes prebuilt analytics functions from Aster Analytics to provide a multi-genre advanced analytics environments.
“The beauty of the Teradata Aster Connector for Spark is its application for a variety of use cases in just about any industry,” said Raghu Chakravarthi, vice president of engineering at Teradata Aster. “For instance, Aster can be the repository for customer data and finance data. Once Aster pre-processes these, machine learning from Spark can be applied to create automatic credit ratings for each customer. Analysts could then use these credit ratings as one variable in a predictive model that ascertains the likelihood of, say, this customer purchasing a new automobile in the next 12 months.”
With the Aster Connector for Spark, customers can use techniques from both Aster Analytics and Spark; pipe various functions together in one workflow; and run a clustering algorithm in Aster Analytics.
LinkedIn open-sources Photon ML
LinkedIn announced it is releasing its machine learning library based on Apache Spark to the open-source community. According to the company, the library provides analytical abilities that help researchers and data scientists make predictions more easily.
“By combining the ability of Spark to quickly process massive datasets with powerful model training and diagnostic utilities, Photon ML allows research engineers to make more informed decisions about the algorithms they choose for the types of recommendation systems listed above,” wrote Paul Ogilvie, engineering manager for LinkedIn’s machine learning algorithms teams, in a blog post.
Photon ML features support for large scale-regression; linear, logistic and Poisson regression; offsets, weights and bounds for coefficients; and optional generations of model diagnostics.