Basho, the company behind the Riak key value store, today announced that the enterprise edition of Riak will now offer cross-data-center replication capabilities.
These new capabilities build on the existing Riak Cloud Storage product. This enterprise product wraps additional capabilities around Riak, allowing it to be used through an API modeled after Amazon’s S3 storage service. Riak Cloud Storage is also multi-tenant, and can store objects by splitting them into smaller chunks and replicating those pieces across the database cluster.
Riak is designed to function with multiple nodes right from the start. Typically, a starter Riak cluster consists of at least five nodes. All data on those nodes is replicated at least three times, though this can be configured. Because of Riak’s architecture and use of multiple nodes, a Riak cluster is always available for reads and writes, with no locking or blocking taking place.
But because Riak is a cluster-based database, replication across locations can be tricky. That’s why Basho today added cross-site replication for entire clusters.
Andy Gross, chief architect of Basho, said, “Large enterprise wants to consolidate storage. We have telcos that want to use Riak as the basis of a public cloud storage system. We can make it act as an S3 lookalike. For the enterprise customers, a big global company can have data close to different continents, and for large service providers, they can use the multi-service capabilities of Riak to build regional zones.”
Not just another NoSQL
While the NoSQL market is still on fire, there are distinct lines that have formed along the database providers. Some, such as Couchbase and MongoDB, are gaining steam from developers while encountering difficulties on the IT side of the fence. Others, like Cassandra and Riak, are being brought in from the IT side, and it’s the developers who have to adjust.
Shanley Kane, director of product management at Basho, said that developers coming to Riak are generally struggling with the shift in concept that comes from moving from a relational database to a key-value store. “It depends on how your application is modeled and structured,” she said. “The biggest barrier to entry is to get people to think purely about keys and values. An application does well when you can model your data with a unique key attached to a value, which seems easy on the surface, but when you have people used to relational data models, it can be a challenge.”
Gross added that, while Riak is a pure key-value store, it also offers some features to make development easier. “One of them is search. You can make a Riak cluster look like an Apache Foundation Solr cluster from the client point of view,” he said. “You can write map/reduce jobs in Erlang or JavaScript, and run a distributed query across that. We also have secondary indexes, which allow you to tag objects with secondary keys to look up later.”
But there is one area that does cause trouble for developers: test environments. Because Riak is designed to work with at least three nodes (and preferably at least five nodes), it’s tough to fire up a test cluster on a local workstation. Developers who do fire up one instance of Riak will be disappointed to see that it’s not any faster than a single instance of MySQL. But to fire up a single instance of Riak is to miss the point entirely.
Erlang is in there because Riak is written in it. That’s one of its strong points, said Gross. And because of these technological underpinnings, Kane said that Basho has been an enterprise company for some time now. The deals Basho is closing are too expensive for startups, but she added that telcos and large businesses have been eager to embrace Riak as a solution to their high-speed data-hosting woes.
Of course, no discussion of NoSQL can be complete without a comparison to the two leaders in the space, Apache Cassandra and MongoDB. The comparison to Cassandra is a similar one, as it is heavily focused on larger enterprise installations. One differentiator, said Gross, is that a Riak cluster can be expanded a single node at a time, while best practices for Cassandra dictate a doubling of size in each expansion.
The comparison to MongoDB, however, is one that Kane was excited to make. She said the developers who go to Riak tend to have already used MongoDB. “Usually by the time they reach us, they’ve already failed with MongoDB,” she said.
“From an architectural standpoint, MongoDB uses replica sets as their mode of replication. It’s just the same word for master/slave. For read-and-write scale, they don’t solve that problem. They say they do automatic charting in practice, but the operational complexity of scaling MongoDB tends to be as complex if not more complex than MySQL.”
These sentiments would seem to be upheld by the development community, which has been blogging extensively about scaling problems with MongoDB for some time. However, 10gen has long insisted that MongoDB scales, and the mainline business exists in helping to scale MongoDB.
Ivan Voras, for example, took the time to port MongoDB to FreeBSD 8. He was turned off, as he wrote in his blog, by MongoDB’s default use of lazy writes, which can cause data to be lost if a server goes sideways.
And earlier this year, mobile startup Kiip posted a blog about its year with Mongo. Of its complaints, the global write-lock was cited as the chief problem.
Basho has taken a different approach, said Gross. With Riak existing as an open-source project on its own, Basho has constructed enterprise additions for the key value store, such as security and governance tooling. The actual scaling of the key-value store is trivial, he said, as the database was initially designed for ease of administration.