Databricks announces DataFrames for Spark
Apache Spark platform provider Databricks has announced DataFrames, a new API for Apache Spark 1.3 designed to simplify distributed data processing for more immediate Spark use.
The DataFrames API was built to resemble R and Python data frames, providing a familiar interface for data scientists and building on the Spark SQL query optimizer for code execution on machine clusters. DataFrames also enables access to third-party data sources, including NoSQL stores, and supports a variety of additional data sources with automatic computational optimization.
DataFrames will be incorporated into Spark 1.3 and released in early March.
Pentaho 5.3 released with Amazon and Red Hat integrations
Big Data analytics company Pentaho, set for acquisition by Hitachi, has announced the release of Pentaho Business Analytics 5.3, featuring integrations with Amazon Redshift and Cloudera Impala.
Pentaho Business Analytics 5.3 adds new governed data analytics through Pentaho’s data refinery and improved embedding with interactive reporting updates. The integrations with Amazon’s Redshift data warehousing solution for AWS and Cloudera’s Impala parallel-processing query engine widen the Big Data deployment options of Pentaho’s analytics service.
Pivotal open-source its entire Big Data Suite: Read more here.
HP unveils Haven Predictive Analytics leveraging R
HP rolled out Haven Predictive Analytics, an analytics service built on its Vertica analytics database and implementing the distributed R statistical programming language as an open-source analytics engine to run code against data sets. Haven Predictive Analytics is fully open source, and also includes native SQL support and out-of-the-box parallel algorithms.
Tableau Software launches Apache Spark SQL support connector
Data visualization company Tableau Software announced support for both Apache Hadoop and Apache Spark via a direct connector. Tableau support for in-memory Spark SQL data will allow direct Big Data visualizations through the Tableau interface.
Dato updates machine learning program with GraphLab Create
Dato, the machine learning startup formerly known as GraphLab, has updated its platform with an open-source version of the GraphLab Create distributed C++ computational framework. The release also adds predictive service deployment enhancements and a new Data Matching Toolkit for task automation to the platform.
Latest release of the MapR distribution including Hadoop enables the real-time, data-centric enterprise: Read more here.
IBM accelerates data science success for the enterprise: Read more here.
Altiscale announces updated, enterprise-class authentication security; Apache Spark on the Altiscale Data Cloud: Read more here.
RapidMiner makes self-service advanced analytics available for Hadoop; announces US$15 million in funding: Read more here.
Qubole adds Apache Spark to its Big Data-as-a-Service platform: Read more here.
BlueTalon announces breakthrough data access and security for Hadoop and $5 million in funding: Read more here.
Tamr Unveils Enterprise Platform for Scalable, End-to-End Data Unification; Announces Two Packaged Data-Unification Solutions for Business Analysts: Read more here.