Concurrent has updated its flagship, open-source project, Cascading. This Hadoop development library gives developers a way to separate their business logic from the rest of their Hadoop code. The result is that Cascading 2.5, released today, is now able to interface with multiple versions of Hadoop, and to export data from a Hadoop cluster using a SQL query.
Chris Wensel, creator of Cascading and CTO of Concurrent, said that Cascading brings a more familiar development model to the Hadoop world. “Cascading is a Java library that adds two key core components. It allows you to isolate your business logic and do tests in a model you’re familiar with. And it has an alternative API of MapReduce, though it uses MapReduce under the hood. You can focus on business logic by simply reading and writing files,” he said.
In order to expand the capabilities of Cascading, version 2.5 adds Lingual. Lingual executes ANSI SQL as Cascading applications across a Hadoop cluster. While these queries don’t return as fast as a SQL query into a relational database, they do allow developers to use SQL to pull data out of Hadoop.
Wensel is clear that this SQL support for Hadoop is not intended to be competition for Greenplum or other Big Data analysis systems that allow large-scale SQL queries across Big Data sets. Instead, he said, “We’re being honest and saying Hadoop is great for migrating workloads without low-latency SLAs. Hadoop is glue: It’s good at integrating systems and working reliably.”
Lingual also comes with a JDBC driver, allowing developers to treat Hadoop as a standard Java-accessible SQL-addressable data source.
Cascading 2.5 also uncouples the Hadoop-specific code from the actual Cascading functionality. In version 2.0, this manifested as the ability to run Cascading code in memory, without Hadoop. In version 2.5, this capability means that Cascading code can run on Hadoop 1.x and Hadoop 2.x without modification.