The big news out of the Strata conference on Big Data is Tresata, a new company that as far as I can tell is the only one building commercial products on top of the new Apache Spark project. And that is precisely why they’re the belle of this year’s Big Data ball.
Spark is a large-scale data-processing engine. Tresata offers a Spark-based network-mapping application called NET 1.0. While the application itself isn’t nearly as exciting as what most enterprises are looking for (end-to-end customer metrics and analytics for decision-making), NET 1.0 is very emblematic of the future of Big Data applications.
(Related: Helpful tools for working with Hadoop)
Until now, we’ve all spent most of our time on the Big Data beat harping on Hadoop and its applications to business analysis. Essentially, the past five years of Hadoop hype have been almost exclusively focused on data warehousing-type problems: put the logs and data and databases in one place, then run simple analytics to see where the business holes and the unknown successes are.
But Strata shows that the future of Big Data may be fairly similar to the past of small data. That is to say: Big Data applications are the new packaged apps for enterprises.
Tresata’s NET 1.0 isn’t about pouring tons of data into a slow-moving cement-mixer of analysis. Rather, it’s about real-time results of network asset analysis. The resulting application, though it is backed by a cluster and runs on top of Hadoop-like infrastructure, is still able to be responsive and offer real-time information on network activities.
This is a game-changer. It signals a future where not only developers will be gatekeepers to Big Data.
While today we have two major Hadoop companies in Hortonworks and Cloudera, plus a smattering of Hadoop product companies, it’s easy to see a future where dozens of ISVs are all selling products based on Hadoop, not just to fill in the Hadoop gaps, as they are now.
Koert Kuipers, CTO of Tresata, said that “We have always believed the Hadoop ecosystem will offer a complete set of capabilities to be an all-encompassing data analytics platform. By incorporating the in-memory capabilities of the Spark framework into our predictive machine-learning algorithms, we have now brought to market real-time analytics applications completely built in Hadoop, delivering unmatched accuracy, speed and scalability.”
Clearly, Tresata has greater ambitions than offering a network scanner. You can probably imagine the kinds of simpler applications they could build with Spark as the back end and HDFS as the data storage mechanism. I can envision a future where certain industries get specific packaged analytics apps built on Hadoop.
But no matter how that future takes hold, there is one thing I think we can all be sure of: When it comes to ISVs selling packaged applications based on Hadoop infrastructure, we’re right at the beginning of this thing. In 10 years’ time, I could see a world where every major enterprise software package is either based on Hadoop or has Hadoop connectors. We’re already knee-deep in the connectors phase of growth. Now, it’s time for those Hadoop-based applications to start making their way to market.