October has been a busy month for the emerging NoSQL market. Oracle kicked it off with the announcement at its annual conference of its own NoSQL database, a scalable cluster-based version of the Berkeley DB key/value store. Then, the Apache Foundation and DataStax announced the release of Cassandra 1.0, which adds multi-threaded compaction and performance improvements to the popular NoSQL database.
Despite the heady pace of innovation and the release of new open-source projects in the NoSQL space, there are still new projects emerging almost every day. Even as Oracle is entering a market that until now it has maintained was filled with snake oil, smaller startups and groups of developers are still putting together new and novel solutions for quickly managing and storing large amounts of data.
And that’s probably because some NoSQL companies are already seeing big gains inside enterprises and government organizations. MarkLogic, the company behind the unstructured data store of the same name, has been growing quickly with the rise of the NoSQL movement, said David Gorbet, vice president of product strategy at MarkLogic.
Gorbet said MarkLogic was founded in 2003, and that its database product increasingly fills a hole left by traditional relational databases. “The principle of the company is that there is some data that’s difficult to fit into a relational data store that needs a different paradigm for managing it, but it still needs the benefits of a database behind that,” he said.
“Our first customers were in intelligence and publishing, and more recently we’ve been making inroads to financial services and healthcare firms that have big data problems. We’re a private company with over 250 employees. We’ve had great growth. We grew by 45% in 2010, and we’re on track to grow faster than that in 2011.”
That growth is fueled by the increasing need within enterprises for solutions to their big data problems. While solutions like Hadoop are handling the after effects of the big-data explosion and allowing enterprises to get their arms around the data slowly, NoSQL solutions deal with the opposite end of the problem: storing and making that unstructured data available for immediate use in scalable Web and mobile applications.
And while 2010 was the year that the NoSQL movement kicked off, 10gen CEO Dwight Merriman said that it was 2011 in which enterprises began to actually use NoSQL databases.
“I think they wanted a certain level of maturity, and only this year has it gotten to that point where they’re comfortable,” said Merriman. “I think what’s happened in 2010…was it became a popular product. That was the biggest change from our point of view in 2010. In 2011, enterprises are using the stuff now. Starting in January of this year, we’ve seen that really ramp up. We see Fortune 500 companies using MongoDB, or other projects in the space.”
So great is the demand for these solutions that even Oracle was hearing about it from its customers. Marie-Anne Neimat, vice president of database development at Oracle, said that customers had been asking about NoSQL for some time.
“Our customers have been asking us, ‘When is Oracle going to have an offering around NoSQL?’ We felt like we have a lot of expertise in data management, and we felt we could offer a more complete solution for our customers by having a NoSQL that is going to, over time, integrate with all of our products and have a migration path,” she said.
Indeed, Oracle once criticized NoSQLs as incomplete and immature solutions to problems that the company felt didn’t exist. Gorbet said that Oracle’s entry into the market is more of a validation of the need for these solutions than it is a threat to existing players.
“It’s an interesting position given their previous position, but it does validate that relationals are not the only way to do things,” he said. “For Oracle to be getting into the NoSQL business is just a recognition of reality, really.”
But Oracle has touched on the real secret of the NoSQL revolution: There’s a solution for every customer’s problem. From key/value stores to object stores to graph databases, there are new projects cropping up every month to fill every type of data-store hole that has opened in the past four years.
Ben Werther, vice president of products at Apache Cassandra-focused DataStax, said that many new additions to Cassandra are coming from the demands of existing customers, and this is why so many NoSQL companies are now focusing on building out GUI-based management tools.
“I think, obviously, we’re driven by what our customers are asking for,” he said. “[Administration has] been identified as a gap in many of these systems. We’re fundamentally about focusing on customer progress.”
And in the end, it’s the customers’ needs that really shaped the NoSQL revolution. Said Gorbet: “The bottom line is speed: speed with which you can develop applications on the platform, and the speed with which the platform responds to queries.”
It’s unsurprising that Oracle decided to finally enter the NoSQL market. And it’s also not surprising to see that Oracle’s first entry into the market is a reworking of a database the company purchased in 2006.
Berkeley DB (also known as Sleepycat, after the company Oracle purchased to gain hold of the project) was an in-memory key/value store useful for embedding into single-instance applications. With the move to NoSQL, Oracle spread Berkeley DB across a cluster, and added capabilities that make it easier to manage and maintain the database.
“Berkeley DB is a single-node database, not a distributed node database,” said Oracle’s Neimat “This is now a distributed key/value store. It can scale out over many, many nodes.
We’ve measured performance to over 300 nodes. It still has the built-in high availability of Berkeley DB, and in addition it has a concept of a sub-key. You cannot only identify a record by its binary key, but also by sub-keys. We will cluster all the records with all the same primary key on the same node, so records that share the same primary key but not the same sub-keys will be colocated on the same node, and we provide an API for doing atomic operations on many records that have the same primary key.”
Neimat said that the Oracle NoSQL Database, as it is now called, includes many of the features now being implemented by other NoSQL projects. “It’s easy to install, to set up, and has a nice management console,” she said. “A lot of the components used to manage consistency and eventual consistency offerings out there leave quite a bit of responsibility to the users. We feel we provide an easier means of providing consistency. We think we have a clean model with the sub-keys, as it provides an equivalent to additional columns, but offers a clean way of doing that.”
But no matter what companies offer in the NoSQL market, the biggest constant is that there will always be more data. And the volume of that data isn’t the only problem, said MarkLogic’s Gorbet.
“The story about big data is about more than just volume; it’s really about volume and complexity,” he said. “I think there is going to be a sea change in terms of how people want to process that.
“Spending time and building data models for each new step in a business process was difficult before big data, but it’s almost impossible now. The question now is how do we create a repository that’s flexible and bring our processes to the data rather than running lots of business process?”