When Hadoop first appeared as an open-source framework for scalable distribution computing with large data sets—back in 2009—the project was a lone player in a seemingly empty marketplace. But two years later, a dozen startups are all vying for the Hadoop crown. A big player has just entered the market: At this year’s Hadoop Summit, Yahoo entered the fray by spinning off its own internal Hadoop group as Hortonworks.
Another sign of growth: This year’s event saw 27 sponsors, all of whom are eager to cash in on this popular open-source ecosystem. By contrast, the 2010 summit had only seven sponsors.
Among last year’s sponsors were Hadoop-specific companies such as Karmasphere and Datameer. This year, however, big names like Dell, IBM, NetApp and Supermicro were all sponsoring the event.
Matt Aslett, senior analyst at The 451 Group, said, “As interest in Hadoop expands from early adopters to mainstream enterprise and government users, we are increasingly seeing the focus shift from development and testing to understanding potential use cases for the core distribution to the value-added tools and services that will enable and accelerate enterprise adoption.”
Hortonworks is now just another in a chorus line of Hadoop consulting and services firms. A recent article about Hadoop written on technology news site GigaOM estimated Cloudera’s revenues as a few million dollars, and it pointed out that despite high interest from enterprises, the Hadoop market remains almost exclusively a consultancy-based market, not a product-based market.
And because consulting services don’t scale and rarely bring in the big profits like products can, Hortonworks and other Hadoop firms are facing an uphill battle.
Still, as the Hadoop ecosystem continues to expand and new solutions pop up almost daily, it is the developers who benefit from all of this innovation, even if firms aren’t yet buying Hadoop packages instead of free versions.
And still other firms are spending their time and money on integrating Hadoop into existing process flows, which can often call for packaged software. For these folks, traditional integrations and data management firms have stepped up to the plate.
Firms like Pervasive. Joe Dubin, product manager for Pervasive DataRush, said that his company is preparing a new accelerator for Hadoop users, one that will process batch jobs faster than map/reduce.
“We’re releasing at the end of June in early access form,” he said. “It’s a way to make Hive queries run faster on less hardware without changing Hive scripts. It’s the first in a series of big data accelerators that we will be releasing.
“At a high level, normally when you put a Hive query into Hive, it turns that into map/reduce jobs. We now have it produce an alternative. It can produce DataRush jobs. You access the DataRush back end, construct a DataRush data flow, and execute that query.”
Pervasive’s approach speaks to the Wild-West nature of Hadoop. Enterprises may have fallen in love with the software, but they’re all using it in their own way. Some use Hadoop as a big data store, with HDFS as a way to store petabytes of information cheaply. Others are using Hadoop as a way to pull chunks of data out of cold storage, where they can be moved into a relational database and analyzed with traditional methods. Still, others are using Hadoop as the front-end database by hosting their live information in HBase, the relational database store inside Hadoop.
And thus it all comes back to the central point that Hadoop, as packaged commercial software, isn’t quite ready yet. Cloudera hopes to change this fact with the release of its release of Cloudera Enterprise 3.5. With this release, Cloudera has added support for full life-cycle management of Hadoop jobs, as well as a streamlined management console. The suite is a direct response to what Cloudera sees as the pain points for Hadoop users.
Charles Zedlewski, vice president of product management at Cloudera, said that release 3.5 should push Hadoop from the early adopters to mainstream adoption. “This system was designed by engineers for engineers, and that’s not a tenable way for a typical Fortune 500 company to run Hadoop,” he said.
“The new management suite is a big advance, functionally. With the new and old enhancements, we’ve brought it to a stage where you’re able to manage the full life cycle of the Hadoop system and to diagnose the root cause of problems.”
Cloudera Enterprise 3.5 is not Cloudera’s only product. The company releases free distributions of Hadoop to the public with every passing version of Hadoop and its ecosystem of sub-projects. And while Cloudera’s distribution of Hadoop can be found within Amazon Web Services and across other cloud providers, its competitors are hoping to get in on the distribution action as well.
Karmasphere, for example, announced today the release of its virtual Hadoop appliance for developers at the Hadoop Summit. The company’s distribution is targeted at developers instead of administrators, and thus the company hopes to skirt Cloudera’s popular version of Hadoop by offering one targeted at the development process of batch jobs.
Abe Taha, vice president of engineering at Karmasphere, said, “We know that more and more companies are attracted to the power of Hadoop but just don’t know how to get started. With our appliance, developers can now jumpstart their Hadoop projects with or without a cluster installed and immediately prepare to support the needs of the data analyst professionals across the company who are looking to unlock the intelligence inside unstructured data.”
Combine this new offering with Datameer’s Excel-like data manipulation tools, and Hadoop is expanding into an end-to-end ecosystem of data manipulation, analysis and storage.