In the wilds of the NoSQL jungle, the Apache Foundation is carving out a path for Cassandra. Version 0.6 of this NoSQL distributed database was released to the public yesterday, and with it comes support for running Apache Hadoop map/reduce jobs on Cassandra stored data, as well as experimental authentication support.
The NoSQL movement began two years ago, as cloud developers and administrators began seeing how difficult it was to scale databases into the cloud. Jonathan Ellis, Apache Cassandra project management committee chair, said a good rule for deciding whether or not to use NoSQL instead of a relational database is, “If you’re layering memcached on top of MySQL, you’re inventing an ad hoc NoSQL database.”
NoSQL databases are all about scaling to multiple systems, rather than about holding one large database of truth. Ellis said that version 0.6 of Cassandra adds the ability to replace memcached entirely, and to function as its own caching layer.
Cassandra 0.6 is also playing nice with Hadoop. Performing batch actions on data in Hadoop has traditionally required the use of the Hadoop File System. But the new version of Cassandra supports Hadoop map/reduce, allowing developers and Hadoop users to run their batch jobs directly against data stored in Cassandra.
“This is not saying Cassandra is replacing HDFS, but more that to run queries against Cassandra data, you don’t need it to be in HDFS anymore,” said Ellis.
He added that version 0.6 will likely be the last version to maintain backward compatibility with the previous releases. For version 0.7, the Cassandra team plans to open the hood and futz with the underpinnings of the architecture in a manner that will likely break it with past versions.
“Next, we want to do a release that breaks things a little,” said Ellis. “We did 0.4, 0.5 and 0.6, and we were really good at maintaining backward compatibility. But for 0.7, we want to make a few of those low-level technical changes so we can continue to be backward compatible to version 1.0. The row keys have to be strings now, and we need to make those byte arrays. We know we needed to make this change, and now we need to bite the bullet.”
Ping Li, a partner at venture capital firm Accel Partners, said that the NoSQL world is quite interesting and crowded at the moment, and he said the next few years should winnow the field down to a few winners.
“It will be some time before one of them becomes truly horizontal,” he said in evaluating the state of the over two-dozen NoSQL projects out there right now. “I think it’s a fragmented market, but I think if you believe there’s a whole set of applications that are going to be cloud-like, they’re going to be built on this type of database.”