At Big Data TechCon in Chicago this week, attendees were treated to a glimpse of the future of Hadoop and large-scale data processing. With new solutions such as Apache Apex and Snowflake, enterprise options were shown to be expanding fast.
Keynote speaker Owen O’Malley, cofounder of Hortonworks and a 10-year veteran of the Hadoop project, gazed into the future of the platform. He detailed the origins of Hadoop and compared them to the plans Hortonworks and the Apache community have for the project.
“Customers are building these very complicated flows that use a lot of different technologies,” he said. “Apache Eagle is an incubator project that does security analytics over Hadoop audit logs. They capture the audit logs out of the servers, put them in Kafka, put that into Spark, use some machine-learning libraries, process it, generate models, throw those into Storm, and run the model versus the incoming data and shove it all into a server and provide those notifications out to the user. That’s great, but it’s a pain in the ass to setup.”
O’Malley then indicated that what the Hadoop ecosystem needs is a packaging system, similar to Debian’s apt-get or RubyGems.
“You’d like to make it like the iTunes App store, where it comes down, it installs it, and it runs, so it doesn’t take a bunch of Ph.D.’s in computer science to get the thing up and running,” he said.
Elsewhere at the show, DataTorrent demonstrated Apache Apex, a new incubator project aimed at building an enterprise-grade stream processing service on top of Hadoop.
John Fanelli, vice president of product and marketing at DataTorrent, said that Apex was created by a team of experienced Hadoop developers from the original Yahoo team. He added that Apex is designed to handle much of the reliability work that now must be hand-coded into applications for Storm and Spark.