SAN FRANCISCO — For a project that’s grown like Apache Hadoop, it’s not unusual to find many products and services cropping up around it. But Merv Adrian, research vice president for information management at Gartner, said in the opening keynote to this year’s Hadoop Summit that all of this growth has resulted in a confusing and nebulous space in which the term “Hadoop” increasingly means different things to different people.
Adrian likened the current IT situation to Waiting for Godot, with Hadoop taking the place of Godot. He also stated that, while most enterprises have Hadoop in a pilot stage internally, few have pushed it out to production just yet. He said that organizations can unlock a great deal of potentially useful information and cost savings by using Hadoop, but that the marketplace for the platform has become quite crowded.
(Related: How to get a handle on Hadoop)
Despite his optimism at the Summit, Adrian had also published a blog in which he criticized vendors for pushing into uncharted Hadoop territory.
“To a greater or lesser degree, all of these vendors call their products Hadoop—some are clearly attempting to move ‘beyond’ that message,” he wrote. “Some vendors are trying to break free from the Hadoop baggage by introducing new, but just as awful, names. We have data lakes, hubs, and no doubt more to come…
“The vague names indicate the vendors don’t know what to call these things either. If they don’t know what they’re selling, do you know what you’re buying? If the popular definition of Hadoop has shifted from a small conglomeration of components to a larger, increasingly vendor-specific conglomeration, does the name ‘Hadoop’ really mean anything anymore?”
In his talk, Adrian showed off some statistics, such as Gartner’s research pointing to interactive analytics as the most desired feature for Hadoop users. Stream processing and database-management systems were second and third in popularity. He also showed that graph processing with Hadoop had not yet become popular, but he indicated that it soon will. These figures did not include batch processing, which remains the most popular way to use Hadoop.
Adrian also indicated that “Hadoop” and “Big Data” are the second most-searched terms on Gartner’s site, behind “Magic Quadrant.”
That increasingly ambiguous nature of the platform, however, is considered to be one of its strongest selling points by Herb Cunitz, president of Hortonworks.
“We’ve started to see an inflection point in the market. People are asking, ‘What else can I do with Hadoop?’ ” he said.
“Now I can do a whole lot more and leverage YARN as a data operating system, and move Hadoop to be a core piece of my environment. Since 2013, over the course of the past year, the community has introduced many releases of Hadoop. We’ve seen the advent of security, governance, and the release of Hadoop 2.0 and 2.1. We now have Tez or Hive for interactive queries, and in-memory processing through Apache Spark.”
Thus, said Cunitz, Hadoop is no longer just a platform for doing batch operations on a set of static data. It’s a platform for doing real-time processing, stream processing, and batch processing, while storing all that data in the same place. He added that the enterprise needs for security, governance and monitoring have all contributed to steering the direction of the platform and of its vendors for the coming years.
To this end, both Cloudera and Hortonworks have made security acquisitions around their Hadoop platforms. On May 15, Hortonworks acquired XA Secure; and today Cloudera announced its acquisition of Gazzang.