Uri Cohen, vice president of product management at GigaSpaces, said there are two major reasons for the growth of in-memory data grids. “The first is that the market for such solutions has definitely grown, with drivers such as the explosion of user-generated data and Web-scale deployments,” he said. “Whereas in the past most of the demand for these technologies came from high-end financial services and telecom applications, today it’s prevalent in almost every vertical.
“We’re seeing this demand in e-commerce, travel, fraud detection, homeland security, and SaaS implementations, to name a few. Some apps need to process tens or even hundreds of thousands of events per second, which is only feasible if you’re using a distributed architecture. The in-memory aspect of things is what allows you to do it at real-time latencies, meaning you can do this as the events are flowing into your system and not have to wait for a batch Map/Reduce job to get the processed data and insights.
“The second trend is that with the advent of NoSQL data stores and cloud technologies, which drive people toward distributed architectures, the market is much better educated about such technologies and understands the trade-offs associated with them, with terms like CAP and BASE being widely known and reasonably well-comprehended. This saves us a lot of work in explaining our technology and how to implement your applications on top of it.”
Indeed, analytics are a major new draw to in-memory data grids, said Pezzini. “In the context of big data applications, what happens is the customers store data in memory to an in-memory data grid. In some cases, they are storing terabytes of data, and they want to run analytical types of applications on top of that. [That means supporting] query languages and Map/Reduce APIs. I believe this will be the next battleground.”
Craig Blitz, senior group product manager of Oracle Coherence, the in-memory data grid cited as the most popular by Pezzini, said that Coherence is seeing increasing use in analytics, as well.
“Data grids play a key role in handling these new demands,” said Blitz. “Their distributed caching capabilities offload shared services by reducing the new number of repeated reads across all application instances and, in the case of Oracle Coherence, by batching and coalescing writes. Application objects are stored in-memory in the application tier or a separate data grid tier (or both, using Oracle Coherence’s near caching capabilities). Oracle Coherence provides a rich set of query, Map/Reduce aggregation and eventing capabilities to provide for a scalable compute platform as well. Companies are using data grids to drive real-time event-based calculations as application objects are updated, then to provide fast access to those calculations.”
Ted Kenney, director of marketing at McObject, has taken a more vertical approach to in-memory data grids. McObject now offers the eXtremeDB Financial Edition specifically targeted at stock-trading platforms and other high-volume, high-speed trading systems.
“Data in the financial space is fairly specialized market data—with things like trades and quotes—and a lot of it is time-series data,” said Kenney. “It’s the same data point with different values that change at regular intervals over time. That requires a somewhat specialized approach to data management. Column-oriented databases are very good at that for fairly technical reasons. It’s much more efficient to manage it in a column database than in a row. That is the heart of the new features we’ve added in eXtremeDB Financial Edition.”
And that shows one of the key differentiators between in-memory data grids and other Big Data solutions, said Kenney: In-memory data grids don’t have to be huge. An Apache Hadoop cluster is almost worthless unless it’s hosting at least a dozen terabytes of data; analyzing such data would be faster on a local system than a cluster designed for petabytes of data. Thus, in-memory data grids, like eXtremeDB Financial Edition, can still yield performance benefits when hosting smaller sets of data (in the terabyte range and below).
“In the financial space, it doesn’t feel particularly crowded because of the unique needs of that space,” said Kenney. “The new entrants are large-scale big software applications. We find one of the most appealing things to developers in the financial space is something that comes from our embedded background: It has a short execution path, and doesn’t consume as many CPU cycles as these NoSQLs will.”