New waves of application development technology are often incompatible with old ways of thinking. Typically, when a brave new world opens to programmers, a healthy portion of them will cast aside the old ways in favor of the new. But the NoSQL movement is not about throwing out your SQL databases to be replaced by key-value stores. NoSQL, ironically, has nothing to do with avoiding SQL, and everything to do with the judicious use of relational databases.
NoSQL databases encompass a large swath of new databases. They include the Apache Cassandra Project, an array of key-value stores such as Tokyo Cabinet, and even document databases like CouchDB and MongoDB. NoSQL is a broad term that has more to do with what a database isn’t rather than what it is.
Ping Li, general partner at Accel Partners, a venture capital firm in Silicon Valley, said his firm is watching the NoSQL movement closely but does not yet see a clear leader in which to invest. “It will be some time before one of [the NoSQL databases] becomes truly horizontal,” he said.
“I think it’s a fragmented market. I think if you believe there’s a whole set of applications that are going be cloud-like, they’re going to be built on this type of database.”
Why to use
Mike Gualtieri, senior analyst at Forrester Research, said that NoSQL doesn’t have anything to do with throwing out a relational database. He said that NoSQL really stands for “Not Only SQL.” He added that a NoSQL database can make a great alternative to spending enterprise funds on a new Oracle rack of database servers.
Gualtieri said NoSQL is “not a substitute for a database; it can augment a database. For transaction types of processing, you still need a database. You need integrity for those transactions. For storing other data, we don’t need that consistency. NoSQL is a great way to store all that extra data.”
He said that saving actual customer purchasing information is better suited to a relational database, while storing more ephemeral information, such as customer product ratings and comments, is more appropriate for a NoSQL database.
That means developers working with large Oracle installations can rationalize the addition of a NoSQL layer into their application stack. While Gualtieri said that doing so introduces complexity into operations and architecture, developers win out in the end because most NoSQL databases are built on top of extremely simple APIs that make building applications a breeze.
“From a management standpoint, it does complicate things,” said Gualtieri. “Now you basically have to manage your database and NoSQL. And the NoSQL stuff is a bear to manage right now because it’s new.
“A lot of complexity is in the configuration. The way NoSQL provides high availability is by replicating data. When you save to one node, it syncs to another node. The way you do that is complicated. But, I don’t think it needs to increase the development complexity. From a developer standpoint, it’s very easy to access that data. Most NoSQLs have a simple API that says, ‘Get me this data,’ and it returns the data.”
For developers, NoSQL databases may provide a simple path to adding features to applications. “Say you’re adding some new features. The first thing people think of is their database,” said Gualtieri. “How am I going to add this to my SQL database? You have to create tables and fields. Why not look at how you could store some of that data in a NoSQL database instead?
“The reason you’d want to do that is that it’s cheaper. When your data scales, it’s much more expensive to scale a database than a NoSQL. Maybe you can defer, infinitely, the need to go to an expensive rack solution.”
When to use
Jonathan Ellis is the project management committee chair for the NoSQL database, Apache Cassandra. He said he was initially drawn to the NoSQL movement because he knew that distributed databases were essential for successful cloud applications.
“I looked at the open-source projects, like Apache Hadoop’s HBase and Project Voldemort,” he said. “I saw that Facebook had open-sourced Cassandra and dropped it on the floor. It had no community at all. I felt the tech was the strongest. I thought if I start with best technology, I can build a community around it. I started contributing to Cassandra and built it to where it is now.”
Ellis said that the developers at Digg invented a rule of thumb for deciding whether or not an environment necessitates a NoSQL database like Cassandra: “If you’re layering memcached on top of MySQL, you’re inventing an ad hoc NoSQL database by doing that,” said Ellis.
Essentially, that coincides with Gualtieri’s view of the NoSQL oeuvre: doing the jobs that may have traditionally used a relational database, but which requires only a fraction of that functionality. Spreading a single row or column of data across a hundred machines, which allows for highly available and redundant data, is a task at which NoSQLs excel.
And yet there is still a broad swath of databases included in the definition, some of which are not suited to this task. The NoSQL databases themselves spread to all sides of the CAP triangle. Developers have long known they must choose only two: strong data consistency, high availability, or partitions. Every NoSQL database falls into at least one corner of the CAP triangle. These distinctions dictate how the database should be used.
Google’s BigTable, MongoDB and Redis are in the consistency and partitions corner of the CAP triangle. The information in these databases is strongly consistent and servers can be physically distributed across the globe. But not all nodes in such systems can read and write data at all times, limiting the high availability.
Cassandra, CouchDB and Tokyo Cabinet are in the partitions and availability corner of the triangle. That means that Cassandra is a highly available data store that can run on physically distributed servers. But it is also an eventually consistent system, just as are CouchDB and Amazon’s SimpleDB.
Nathan Hurst, founder of hirelite.com, actually graphed out the various NoSQL databases onto a single image. Each NoSQL is placed along the CAP triangle to show its strengths and weaknesses.
Hurst said that he learned about NoSQLs while developing hirelite.com as a Chatroulette clone for introducing developers to employers. He used MongoDB to build the site, and, he said, “I was using an API for an event site that does a lot of my ticketing. [MongoDB] has an API to access its data, and you communicate with it in JSON. It was going to be a quick thing to integrate, and I didn’t want to spend a whole lot of time on it.
“Because it uses JSON, I used MongoDB. The thing about MongoDB and JSON is that you insert objects and do all your querying with JSON, so I was able to quickly shovel in anything I got from the API into MongoDB without having to set up all the tables.”
Database shake up
Jay Jarrell, president and CEO of Objectivity, said that the NoSQL movement is the “biggest paradigm shift in the data management world over the last seven or eight years.
“The social networks of the world had a specific need, and they rolled their own. They couldn’t use a central server relational database because relational databases don’t do relations very well. It’s been good for the NoSQL guys. We have specific problems; it’s OK to have a specific tool in the tool box. There’s going to be many tools in the next generation of data management.”
But despite the movement among the Web 2.0 and social network crowds, NoSQL is still in its infancy. Enterprises, said Gualtieri, have yet to even hear about these new databases.
“Wherever I go, one of the things I am always asking enterprise customers is, ‘Have you heard of NoSQL?’ Most have not,” he said.
“They don’t know the term, they don’t know what it means. I think it’s very early in terms of enterprises understanding how to use this.”
Accel Partner’s Li said that the databases in the NoSQL movement are still evolving and finding their niches. That means the future direction of the movement itself is uncertain.
“Do these things all end up looking like a relational databases over time? Do they all evolve into that? I don’t know if that’s the case,” he said.
“I am a believer that not all applications are going to end up looking like SAP. The applications don’t need that heavyweight overhead. I think there’s not going to be one database that rules them all. Oracle did that for many years. I don’t think there’s going to be 10 of them. Maybe two or three.”
While Li remains convinced that the NoSQL space will undergo a winnowing over time, he also said that the developers behind each NoSQL project have become attractive hires at companies looking to assert their dominance in the cloud, such as when, in March, VMware hired Salvatore Sanfilippo of the Redis project.
But no matter what the future of each NoSQL project is, Li said there’s a common thread that all the NoSQL databases pull upon. He said that the use of NoSQLs will “evolve people into this new cloud model by making existing applications become more cloud-like.”
Jarrell is confident that NoSQL provides opportunities for many database companies, old and new. “It’s exciting. It’s only going to keep emerging,” he said.