RAM is the hip place to be. For modern applications built to scale out to thousands or even millions of users, scaling the data behind those applications has long been a difficult task. But a host of in-memory data grids from companies like McObject, Oracle and Terracotta are solving the scalable data-store problem, and they all expect 2013 to be a banner year for such software.
Massimo Pezzini, vice president and fellow at Gartner Research, said he expects explosive growth in the in-memory data store market in 2013. While the research he cited is not yet published, he did have figures on the market and its potential for growth.
“We think in 2011 the market for in-memory database was approximately US$250 million in terms of license and maintenance revenue… We’ve spoken with vendors, and some are projecting high double-digit, if not triple-digit growth in 2013. Terracotta is expecting to triple its revenues this year. We think this is going to be a $1 billion market by 2016. In software, $1 billion is a big market,” he said.
There are many reasons for this growth, but Pezzini said a few use cases are most common. “I would say the most obvious use case is really caching: caching a database, caching a session in a website, etc.,” he said. “Quite a lot of customers have started using an in-memory data grid in that way: their own layer to speed up the performance of Web applications.” But new use cases are cropping up thanks to the proliferation of clouds and the need to operate at scale.
“Lately, we have seen examples of customers using an in-memory data grid as a data-management platform, as a platform to host the database of record,” said Pezzini. “In business practice, that is not relational, because in-memory data grids are based on an object-oriented NoSQL paradigm. This is one of the reasons customers are looking into in-memory data grids.”
Mike Allen, vice president of product management at Terracotta, thinks there are a few reasons behind the growth of in-memory data grids, but there’s one large factor he cited. “One is data volume, and now you can suddenly get machines with a lot of memory very cheaply,” he said. “You can stack up six servers with a half-terabyte of RAM each, and then keep all your data in memory, which was never really possible before. We scale that grid predictably and scale it to that capacity.”
Another reason for the growth of in-memory data grids, said Allen, is the new focus on analytics in business. With Terracotta, or any other in-memory data grid, analytics can be run in real time because all the information is stored in RAM.
“If I’m doing transactions on an e-commerce site, I don’t typically have a view into those transactions until after,” he said. “But I can now look at that data in-flight, and do real-time promotions or modify pricing. I can offer people incentives, or correlate current actual behavior with profile info about historical access.”
Uri Cohen, vice president of product management at GigaSpaces, said there are two major reasons for the growth of in-memory data grids. “The first is that the market for such solutions has definitely grown, with drivers such as the explosion of user-generated data and Web-scale deployments,” he said. “Whereas in the past most of the demand for these technologies came from high-end financial services and telecom applications, today it’s prevalent in almost every vertical.
“We’re seeing this demand in e-commerce, travel, fraud detection, homeland security, and SaaS implementations, to name a few. Some apps need to process tens or even hundreds of thousands of events per second, which is only feasible if you’re using a distributed architecture. The in-memory aspect of things is what allows you to do it at real-time latencies, meaning you can do this as the events are flowing into your system and not have to wait for a batch Map/Reduce job to get the processed data and insights.
“The second trend is that with the advent of NoSQL data stores and cloud technologies, which drive people toward distributed architectures, the market is much better educated about such technologies and understands the trade-offs associated with them, with terms like CAP and BASE being widely known and reasonably well-comprehended. This saves us a lot of work in explaining our technology and how to implement your applications on top of it.”
Analyze that
Indeed, analytics are a major new draw to in-memory data grids, said Pezzini. “In the context of big data applications, what happens is the customers store data in memory to an in-memory data grid. In some cases, they are storing terabytes of data, and they want to run analytical types of applications on top of that. [That means supporting] query languages and Map/Reduce APIs. I believe this will be the next battleground.”
Craig Blitz, senior group product manager of Oracle Coherence, the in-memory data grid cited as the most popular by Pezzini, said that Coherence is seeing increasing use in analytics, as well.
“Data grids play a key role in handling these new demands,” said Blitz. “Their distributed caching capabilities offload shared services by reducing the new number of repeated reads across all application instances and, in the case of Oracle Coherence, by batching and coalescing writes. Application objects are stored in-memory in the application tier or a separate data grid tier (or both, using Oracle Coherence’s near caching capabilities). Oracle Coherence provides a rich set of query, Map/Reduce aggregation and eventing capabilities to provide for a scalable compute platform as well. Companies are using data grids to drive real-time event-based calculations as application objects are updated, then to provide fast access to those calculations.”
Making money
Ted Kenney, director of marketing at McObject, has taken a more vertical approach to in-memory data grids. McObject now offers the eXtremeDB Financial Edition specifically targeted at stock-trading platforms and other high-volume, high-speed trading systems.
“Data in the financial space is fairly specialized market data—with things like trades and quotes—and a lot of it is time-series data,” said Kenney. “It’s the same data point with different values that change at regular intervals over time. That requires a somewhat specialized approach to data management. Column-oriented databases are very good at that for fairly technical reasons. It’s much more efficient to manage it in a column database than in a row. That is the heart of the new features we’ve added in eXtremeDB Financial Edition.”
And that shows one of the key differentiators between in-memory data grids and other Big Data solutions, said Kenney: In-memory data grids don’t have to be huge. An Apache Hadoop cluster is almost worthless unless it’s hosting at least a dozen terabytes of data; analyzing such data would be faster on a local system than a cluster designed for petabytes of data. Thus, in-memory data grids, like eXtremeDB Financial Edition, can still yield performance benefits when hosting smaller sets of data (in the terabyte range and below).
“In the financial space, it doesn’t feel particularly crowded because of the unique needs of that space,” said Kenney. “The new entrants are large-scale big software applications. We find one of the most appealing things to developers in the financial space is something that comes from our embedded background: It has a short execution path, and doesn’t consume as many CPU cycles as these NoSQLs will.”
But analytics remains a major draw for new users of in-memory data grids, said Jon Webster, vice president of business development at GridGain. “We take an in-memory data grid and we also have a processing data grid. We give you a cohesive API. That’s where you get the processing plus storage. If you look at some of the older caching technologies, you’re still dealing with the pattern of ‘The data is stored somewhere; even if it’s in Coherence, take it out of there and put it back into an application server and process it somewhere else.’ The real problem is that moving that data is exceptionally expensive,” he said.
To that end, GridGain uses that cohesive API to allow developers to analyze data stored in their in-memory data grid without having to move it to another platform. When dealing with fresh terabytes of data every day, that can save developers a great deal of time.
You host it for me
The popularity of in-memory data grids has even spawned a new cloud-based service. Ofer Bengal, cofounder and CEO of Garantia Data, said that his company has taken the popular NoSQL data store Redis and turned it into a scalable data grid platform hosted within the company’s own cloud service.
While Redis is an open-source key-value store, Garantia Data turns the software into an automatically scalable in-memory data grid for applications that need terabytes of quickly accessible information.
“When it comes to replicating Redis onto disk, or into persistent storage, you’ll normally see performance degradation,” said Bengal. “We have overcome this problem, and we offer real-time applications to persistence storage without any degradation of performance. We also offer instant fail-over.”
And, at the end of the in-memory day, the usefulness always comes back to the application being tied into the grid. “More and more, we see in-memory data grid technology being embedded into other software products,” said Gartner’s Pezzini. “Enterprise servers, VPN tools, packaged applications. Of course this is going to help a lot with the establishment of this market. We are seeing very interesting dynamics at play here in this market.”
And while the market is growing quickly, Gartner does not yet have a magic quadrant for in-memory data grids. “I think TIBCO could be the rising star here, and could become an important player in this market,” said Pezzini. “Software AG with Terracotta will be growing primarily because of their software marketing organization. [Software AG] is going to leverage Terracotta inside many webMethods products, and a bunch of new offerings they have there in the making around Big Data.”