Arun Murthy is a busy fellow. When he’s not acting as architect at Hortonworks, the Hadoop company he founded, he’s flying around the world giving keynote addresses. This is quite a long ways from where he was 10 years ago, working on Hadoop inside Yahoo.
But then, the future is, typically, uncertain. That’s why we sat down with Murthy to talk about the future of Hadoop and Big Data processing as a whole.
What is the next big focus for Apache Hadoop as a whole?
I think if you look at the big picture, Hadoop started off as map/reduce and HDFS. Things have obviously changed a lot. We’ve had things like YARN for a while now, so map/reduce is no longer the be all end all. We also have Spark and Flink, and a better Hive, and on and on. The infrastructure side of the Hadoop space is alive and kicking.
The infrastructure side is alive and kicking. The idea always was to let a thousand flowers bloom, and that has happened. It’s not just the open-source communities that have done this, either. It’s also other sorts of vendors, like IBM, EMC and SAS. These guys are taking their product lines and making them Hadoop-compatible.
(Related: How to get started with Hadoop)
That’s really great. If you look at Hadoop, we’re coming now to the end of the first big wave of Hadoop. The first wave has been about establishing technologies and making sure enough of the gaps are filled well enough so you can build apps on top of pure data.
As they start to build newer and newer applications, we start going from post-transactions to pre-transaction. Predictive analytics has been around for a long time, but with Hadoop, you can do analytics with very fine granularity. You can make every customer feel special.
What people have realized as they build more and more of these apps, a lot of these new-generation apps are primarily driven by data. You can build apps that delight and inform the end customer, but every model app we’re going to build is also a data app.