The keynote presentation at this week’s Hadoop Summit in San Jose was headlined by Merv Adrian, research vice president at Gartner. And if there was one central message to his talk, it was this: Hadoop and Big Data have a long way to grow.
And that’s not confined in any one direction, vertical or geographical region. According to the data Adrian and Gartner have put together on Big Data usage in enterprises, many companies have yet to even formulate a Big Data strategy, let alone install Hadoop.
This comes out of Gartner’s interviews of more than 600 enterprises. Of them, only 30% had a Big Data strategy currently in place. Nineteen percent said they were planning to implement a strategy within the next year. That left 31% with no plan at all, 15% planning to start in two years, and a remaining 5% that didn’t even know whether they had a strategy or not.
But despite many enterprises not yet working with Big Data, Adrian said Big Data analysis has the potential to help all businesses across all verticals. As he put it, Big Data is more about reclaiming the data you’ve chosen to ignore for years.
He called existing data-collection methods “lossy,” a term that may be familiar to audiophiles out there. A lossy codec, such as for MP3s, sacrifices some of the audio data in your music in exchange for a smaller file, while a lossless codec, such as FLAC (Free Lossless Audio Codec), saves all of the audio nuances of sound without ignoring any audio cue, even those that may be hard to detect with the human ear.
To this end, Hadoop allows businesses to do lossless data capture for the first time. Where before, said Adrian, a company would record a point-of-sale purchase with data such as store location, item ID, date and buyer ID, with a Big Data analysis pipeline firmly in place, there’s no end to the additional data that can be saved with each transaction, data such as whether or not the store was having a sale, the ambient temperature of the store, and information on the buyer, such as whether or not he or she looked up an item online first.
In the past, this data has been left lying on the floor due to the impracticality of capture and analysis. But the promise of Big Data is to bring all of that information back to the forefront of business intelligence. Now that systems like Hadoop exist, there’s no longer a roadblock between the data being generated and the data that is captured.
Adrian is definitely correct in stating that there is room for a great deal of growth under the Hadoop banner. And the Hadoop Summit supports him on that statement, with twice as many corporate exhibitors at this year’s show as last year’s.
What’s particularly different this year is that numerous companies are addressing the traditional corporate problems that Hadoop comes up against. Governance and security are chief among these, and companies like Sqrrl, Zettaset and WANdisco are all trying to cash in on the relative lack of enterprise-grade Hadoop controls.
That’s not to say that the existing Hadoop players aren’t doing their jobs. Cloudera can secure your enterprise Hadoop instance, and MapR can fix any stability problems a traditional Hadoop cluster may experience. And LucidWorks is offering enterprise search on top of Hadoop, something companies have been desperate for. And behind all of this, Hortonworks continues to plug away on the Hadoop 2.0 release.
But all of these Hadoop companies seem to be running in open fields. Despite the fact that most of them—or at least Hortonworks, Cloudera and MapR—compete with each other directly, I don’t get any sense of them throwing elbows or racing to get to killer sales meetings with that one client they just have to land before the other guys do.
Despite a full exhibit hall filled with companies like IBM, Microsoft and Red Hat, I get the genuine sense that there are still not enough vendors or products to go around. Every booth was mobbed today. Even companies that really aren’t selling anything, like Netflix and Yahoo, had folks lined up to ask questions and learn about their open-source tools.
What does all this mean? Go back to your college economics course. Remember supply and demand? Right now, there’s incredible demand and not enough supply. There’s so much demand that something nonsensical is happening: hardware-based Hadoop companies are coming to market.
That’s right, Hadoop appliances are here. Oracle introduced its appliance last year; IBM has been offering its own since it won Jeopardy; and companies like WANdisco and DataDirect Networks are moving into this space as well. Why is a Hadoop appliance nonsensical? Because Hadoop was designed specifically to run on commodity hardware.
But things are so difficult out there right now that appliances are beginning to look quite attractive. If you’re running an IT shop, and you’ve now dedicated five or six workers to keeping Hadoop up and running, that’s a lot of additional work that isn’t being done elsewhere in IT. And even worse, as Big Data analytics are only just now beginning to be practical in the enterprise, those IT workers you’ve allocated to Hadoop probably haven’t even brought you an ROI on their time yet. It’s as if you’ve dedicated a whole IT operations group to R&D.
Adrian also said there is a significant lack of talent out there in the Big Data space. He anticipates around 4.4 million workers being needed before all is said and done. And, because of supply and demand, he said current Hadoop workers are worth a great deal of money. He said he’s even heard of bidding wars over employees who’d only worked with Hadoop once or twice.