MapR Technologies, Inc., provider of the top-ranked distribution for Apache Hadoop, today announced an initiative to integrate Apache Drill, which provides instant, self-service data exploration across multiple data sources, with Apache Spark, the in-memory processing framework that provides speed, programming ease and real-time processing advantages.
“The MapR initiative to integrate Apache Drill with Apache Spark’s high-performance, in-memory data processing will provide a powerful combination,” commented John Webster, senior partner and analyst, Evaluator Group. “MapR support for the complete Spark stack provides Drill users the ability to create advanced data pipelines that leverage Drill’s data agility and Spark’s batch processing capabilities.”
“As the driving force behind Spark, Databricks is pleased to see continued and expanded innovation around Spark to help users derive value from big data faster,” said Ion Stoica, CEO of Databricks. “We are looking forward to MapR integrating Drill with Spark to enable enterprises to expand processing options and unlock deeper insights from their data faster.”
“Integrating Apache Drill and Spark simplifies the development of data pipelines and opens up Drill SQL-based ad-hoc queries on in-memory data,” said M.C. Srivas, CTO and cofounder, MapR Technologies. “Joining forces with Databricks to leverage our combined breadth and depth of technical resources to accelerate innovation is a huge win for customers.”
Apache Drill provides the flexibility to immediately query complex data in native formats, such as schema-less data, nested data, and data with rapidly-evolving schemas, with minimal IT involvement. Because SQL queries can run directly on various file formats, live data can be explored as it is coming in, versus spending weeks preparing and managing schemas and setting up ETL tasks. Additionally, Apache Drill supports ANSI SQL so users can easily leverage their SQL skills and existing investments in business intelligence (BI) tools.