Building applications that run map/reduce jobs shouldn’t be rocket science, says the founder of a startup company that has developed an application development framework for Apache Hadoop.
Hadoop is a Java-based framework that brings methods to large volumes of data on disk for map/reduce data analysis jobs. It is used by Internet search providers, social networking sites such as Facebook and Twitter, financial service providers, and in Amazon’s Elastic Compute Cloud to process large data sets.
Karmasphere, a company founded by alumni of Aventail and Raytheon Systems, today released three new products based on its own framework. The products are Karmasphere Studio Analyst, Karmasphere Studio Professional Edition and Karmasphere Client.
The client sits at the lowest level of the Karmasphere framework, and handles provider interfaces and clusters created with the bigdata open-source computing fabric solution. Above that layer, infrastructure services manage clusters and file systems as well as debugging applications and reporting.
The framework also provides language support for Cascading, Hive QL, Java and Pig, in addition to an integrated graphical user environment.
Hadoop, an open-source map/reduce framework, was originally used by computer scientists and other “rocket scientist” developers, said Karmasphere founder and CEO Martin Hall. “Hadoop is great but not easy to use.”
Hadoop requires developers to understand its file system, and there is no API compatibility between versions and distributions, Hall explained. Business users require an abstraction from that complexity, and analysts want something that looks like SQL, he said.
On the other hand, developers want to be able to see what is going on inside of a cluster, he said.
Karmasphere Studio Professional Edition provides developers with debugging capabilities and visualization tools to determine what occurred during a job. It also has the ability to export Hadoop jobs as binary applications that may be run on any supported operating system without any third-party software. The solutions run on Linux, Mac OS and Windows. Alternatively, a developer could create a Java API package to be used in other applications, said CTO and cofounder Ben Mankin.
Karmasphere Studio Analyst enables technical analysts to perform ad hoc analysis of bigdata hives (which are very large clusters), Mankin said. “It’s SQL on map/reduce…the Holy Grail.”
Differences in distributions and versions are handled by the SQL engine, said Mankin. “There’s no infrastructure, no network tunnels, third-party servers, or funny error messages.”
A standalone edition of Karmasphere Client is also available. It is a Java library that makes Hadoop applications built on top of it more fault-tolerant, and it saves bandwidth, Hall said. Event monitoring is also provided.
The company also ships a community edition of Karmasphere Studio that runs in Eclipse and NetBeans. It does not include all of the visualization and debugging tools that are found in the professional edition.