While many in the Big Data space are talking about stream processing, MapR today announced the availability of Streams, a new product in its Hadoop stack that can be used to stream events across clusters distributed around the world. The new product offers a publish-and-subscribe model for event-driven data access and decision-making.
While MapR Streams sounds like a tool that might replace Storm or Spark, it can also be used in conjunction with both platform, said Jack Norris CMO of MapR. “[Streams] is complementary to streaming analytics. When you look at Streams, it’s not just those analytics that are interesting. Typically, they’re augmenting this analysis with real-time database access and deeper analytics to do real-time pattern recognition. Being able to do that whole gamut on that platform has huge advantages,” he said.
(Related: Kafka reaches version 0.9)
To that end, Streams can handle the event-driven needs of an entire data architecture. That means it can run those event checks against data in Spark, or prepare it for a system like Apex or Kafka. Norris said this can help bring a single view to all of those separate methods of data analysis by hooking them into the worldwide event stream across multiple Hadoop clusters.
“Our starting point was volume, but the reality is we got there one event at a time,” he said. “These events can be generated from sensors, from biometrics, from log details, or customer interactions with traditional systems. All those in aggregate create this Big Data. It’s natural for us to focus on streams and managing the flow of that info from the minute it’s produced so you can better analyze it.”
Will Ochandarena, director of product management at MapR, said that Streams ensures Big Data applications can handle continuous streams of information. “Not all analytics apps understand data as a never-ending stream of events. MapReduce understands the data to be a starting record, an ending record, and everything in between. Because we want this to be useful, we have this ability to put those batch-oriented apps on top of streaming data, which simplifies a huge piece of stream data architecture, which is the movement of the distributed file store just to do the analytics. You can do a MapReduce directly on data that came in as streams,” he said.
Streams is currently in limited preview with select clients, but will be generally available to all customers as a part of the MapR platform sometime in January. It will also be a part of the Community Edition of MapR.