Apache Spark is reaching more users in new places, according to a recently released report. Databricks announced the results of its second annual Apache Spark survey, which revealed Spark is increasingly being used in the public cloud, streaming and machine learning.
“Since inception, Spark’s core mission has been to make Big Data simple and accessible for everyone—for organizations of all sizes and across all industries. And we have not deviated from that mission,” said Matei Zaharia, creator of Apache Spark and chief technologist for Databricks. “I’m excited to see more Apache Spark deployments in the cloud and interest from users to build real-time applications using Spark Streaming, machine learning libraries, and other components, tackling complex problems across a broad range of industries.”
(Related: Big Data innovations at Strata + Hadoop World)
In addition, the report revealed that 61% of Spark users deploy it in the public cloud; a majority of developers who use Spark employ two or more Spark components simultaneously; and Spark usage with R, SQL and Windows increased.
The full report is available here.
MapR tackles microservices
MapR Technologies announced new features designed to support microservices and leverage continuous analytics, automated actions, and rapid response. The MapR Platform has been updated to provide microservices application monitoring and management. The new microservices capabilities include microservices-specific volumes for app versioning, microservices for A/B and multivariate testing, and monitoring of cluster-wide operations and resource usage.
In addition, MapR will provide unified security; support for agile and containerized app development; converged analytics; support for hybrid cloud microservice architectures; logical and functional isolation of services; and continuous multi-master mission-critical disaster-recovery capabilities.
Confluent adds new features to Confluent Enterprise
Steaming platform provider Confluent announced new features designed to give enterprises real-time capabilities for their solutions. The Confluent Enterprise platform uses Apache Kafka to simplify stream-processing app development.
The latest update features multi-data-center replication, automatic data balancing, and cloud-migration capabilities. “We’re building a streaming platform to make it easy to put stream processing in practice for organizations of any size, and will continue to release features that help our customers along this journey,” said Neha Narkhede, cofounder and CTO of Confluent.
Cask announces integration with Microsoft Azure HDInsight
In order to speed up Big Data’s time to value, Cask has announced a new integration designed to cut that time down by 80%. The Cask Data Application Platform (CDAP) will be integrated with Microsoft Azure HDInsight.
“With CDAP certified to run on Microsoft Azure and available on Microsoft Azure HDInsight, enterprises can rapidly enable data lakes on Azure and the running of advanced data applications in the cloud, drastically simplifying and accelerating time to value from their data,” said Jonathan Gray, founder and CEO of Cask.
CDAP is completely open source and features standardized APIs, pre-built templates and visual interfaces. The latest version features Cask Market: a Big Data app store with pre-built Hadoop solutions, reusable templates, and ready-to-go data pipelines.
Alation releases version 4.0 of its Alation Data Catalog
Alation is giving businesses the ability to catalog queries from IBM Watson DataWorks, Presto and Spark SQL with the new release of Alation Data Catalog. Version 4.0 uses machine learning algorithms to automatically catalog queries and track patterns in order to help users understand data. It features access to technical metadata, the ability to parse and normalize query logs, and extended support for major databases and Hadoop query engines.
“With the introduction of Alation Connect, we catalog queries alongside reports, dashboards and data,” said Satyen Sangani, CEO of Alation. “Most people access data through views, queries, reports and dashboards, so it’s critical for a data catalog to move beyond an inventory of only physical data assets like tables and files. Queries contain critical context about an analyst’s assumptions, calculations and methods. Cataloging those queries provides exponentially more knowledge than cataloging data alone.”
Splice Machine supports native PL/SQL
In an effort to speed up the migration from Oracle to Hadoop, Splice Machine has announced support for native PL/SQL. This addition is designed to reduce the time and cost of offloading Big Data workloads from Oracle. Users can take advantage of the support though the compiler, which converts PL/SQL, or an interpreter that executes the runtime representation.