MapR Technologies Inc. today introduced MapR Sandbox for Hadoop, a virtualized environment in which developers can experiment with Hadoop, among other announcements at the Strata Conference in Santa Clara, Calif.
The company also announced the latest MapR distribution of Hadoop, version 2.2 with YARN, and an integration that brings HP Vertica Analytics Platform into MapR.
The new HP Vertica Analytics Platform on MapR is an interactive SQL-on-Hadoop solution that integrates HP’s analytic platform directly on MapR’s enterprise-grade distribution for Hadoop. The platform includes 100% ANSI SQL-compliance, advanced interactive analytics capabilities, deep business intelligence (BI) and ETL tool support.
“This combination of industry-leading platforms provides organizations with an integrated solution that increases performance and reliability with a smaller data center footprint, eliminating technology limits that often force businesses to make compromises,” said Colin Mahony, VP and general manager of HP Vertica, in a statement released at the Strata Conference today.
Features include:
• Lower total cost of ownership
• Faster performance across a broader range of data types than other SQL-on-Hadoop solutions
• Complete and open industry ANSI SQL, POSIX and NFS standards
• Exploratory analytics on semi-structured data and operationalize insights
• Built-in analytic functions directly on Hadoop
• Integrated solution with no connectors required
• Full high availability for Hadoop
• Unique, native, consistent point-in-time snapshots and mirrors for data recovery and reliability
The latest MapR distribution including Hadoop 2.2 with YARN combines resource management with real-time capability of MapR’s next-generation data platform in order to deliver next-generation resource management, according to the company.
With YARN’s resource management and scheduling capabilities, Hadoop applications are able to share a cluster’s compute resources–thus increasing overall efficiency and utilization of the cluster. The combination of YARN and MapR’s read-write POSIX data platform enables YARN-based application to run on a Hadoop Cluster, share compute resources, and read, write and update data in the underlying distributed file system and database tables, giving organizations the ability to develop and deploy a broader set of Big Data Hadoop applications.
Also announced is the ability for organizations to run the Hadoop MapReduce 1.x and YARN schedulers on the same nodes in the cluster simultaneously. This provides a risk-free path for MapReduce 1.x users to upgrade to the new Hadoop scheduler.
YARN-based applications on MapR inherit the high availability, data protection, disaster recovery, security, and performance of the MapR Distribution.
The MapR Distribution includes over one dozen open source projects, including Apache projects Hive, Pig, Solr, Oozie, Flume, Sqoop, HBase, and ZooKeeper, as well as Apache-licensed open source projects such as Multitool, Hue, Impala, and Cascading.
The availability of MapR Sandbox for Hadoop, a virtualized environment containing MapR’s distribution of Apache Hadoop, allows users to explore and experiment with Hadoop by providing a virtual machine installation of the MapR Distribution for Apache Hadoop along with several point-and-click tutorials for developers, analysts, and administrators
“Hadoop is widely considered the ideal platform for handling Big Data, and the MapR Sandbox is about addressing the common challenge of Hadoop adoption,” said Tomer Shiran, vice president of product management, MapR Technologies. “Organizations face a shortage of Hadoop developers and data scientists, and without useful and easily-accessible training tools, productive Hadoop developers will continue to be in short supply.”