Consistency and ACID: A new path for NoSQL

Published: October 7th, 2013

Relational databases have been commercially dominant and the go-to solution for quite some time. But change is brewing. Large Web applications have created requirements poorly matched for relational databases, and leading companies like Google and Amazon have built themselves new database infrastructures, such as BigTable and Dynamo, from the ground up. Understandably, these new systems have attracted a lot of attention.

The term NoSQL was first popularized in 2009 as a label for a broad class of non-relational, distributed databases designed to tackle this new class of problems. The term quickly gained popularity as the number of open-source NoSQL database projects, such as HBase, Cassandra and Voldemort, grew. Though a precise definition of NoSQL remains elusive, the broad goals usually include fault tolerance, horizontal scalability, good price/performance, and a relatively simple data model.

With the explosion of interest, there are now likely more than 200 NoSQL databases available. Though performance and robustness vary dramatically, many provide scalability and fault tolerance by running on a cluster of commodity hardware.

However, data model flexibility tends to be very limited in these first-generation NoSQL systems. Though all NoSQL databases, as a group, implement a large variety of data models, each one individually usually provides just one basic data model (such as graph, document, key-value, etc.). This drives the unfortunate need for developers and ops teams to adopt multiple databases if they want to take advantage of multiple data models.
#!
Enter the CAP Theorem
To understand NoSQL systems and their data-model inflexibility, we need to dig into the CAP Theorem.

First-generation NoSQL databases were designed in the shadow of Eric Brewer’s “CAP Theorem.” It was so named because the popular (though misleading) summary was that developers had to “pick two out of three” of (C)onsistency, (A)vailability and (P)artition tolerance. The tradeoff applies to distributed systems where communication channels can fail, with dramatic consequences.

Seeing system availability as essential, the CAP Theorem was used as justification by first-generation NoSQL systems to abandon consistency in order to maximize availability. Thus, the much weaker model of “eventual consistency” was adopted. Eventual consistency means, simply, that writes to the database “eventually” get done, and are “eventually” seen by other clients.

Eventual consistency represents a dramatic weakening of the guarantees that traditional databases provide, and they place a huge burden on software developers. Designing applications that maintain correct behavior even if the accuracy of the database cannot be relied upon is quite a challenge!

From a recent Google paper:

“We also have a lot of experience with eventual consistency systems at Google. In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date. We think this is an unacceptable burden to place on developers and that consistency problems should be solved at the database level.”

In this same paper, Google detailed a system called F1, which is a scalable and fault-tolerant SQL database, which seems to contradict some people’s previous understanding of the CAP Theorem.

In 2012, Brewer explained that the CAP Theorem had been widely misunderstood. He noted that “the ‘two of three’ formulation was always misleading,” and that “CAP prohibits only a tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare.” This point is fundamental because the CAP notion of perfect availability (i.e., even disconnected nodes can accept writes) is very different from the availability of the database as a whole to a client. Therefore, a distributed database can be designed to be fault-tolerant and highly available without supporting perfect “availability” in the CAP sense.

Google has perhaps gone the furthest in the reconsideration of CAP with its Spanner database, which is intended to replace BigTable across a wide range of Google applications. Spanner is a globally distributed database providing not just strong data consistency, but also true multi-row ACID transactions like its SQL cousins.
#!
The future of NoSQL
Databases are undergoing a sort of Cambrian explosion, with many new approaches being explored after decades of relative stability. The experimentation with new data models and query languages has been particularly broad and exciting.

Though no standards have emerged, it seems like key-value is emerging as a ubiquitous model, as is the “document” concept of hierarchical data. It’s likely that, as applications and languages evolve, this experimentation will continue and there will be a broad array of data models represented in the next generation of NoSQL databases.

Sacrificing strong consistency (and therefore ACID transactions) seems to have been too hasty. Transactions, which have been part of SQL databases for decades, allow applications to both deal with concurrent client access and build robust abstractions. This is evident in that there are several NoSQL companies starting to head toward adding transactional features to their solutions. For some time, NoSQL vendor MarkLogic has had multi-statement transactions, where they write and lock data with a two-phase commit. And DataStax, the company behind Cassandra, recently added Compare-and-Swap operations (calling them “lightweight transactions”) as part of its 2.0 release.

Google was seen leading the way in NoSQL when it introduced BigTable as an alternative to relational systems that were hard to scale and required exotic and expensive hardware. But as Google observed, building systems without transactional guarantees is difficult, even for the most experienced of engineers.

It seems as though Google is yet again redefining NoSQL with F1, a layer on top of the Spanner NoSQL database that adds a SQL data model and query language. This points to an exciting future for NoSQL databases. Next-gen NoSQL systems will continue to employ shared-nothing, distributed architectures with fault tolerance and scalability, but they may follow in Google’s footsteps by aggressively exploring strong consistency with transactional guarantees.

Dave Rosenthal is cofounder of FoundationDB, a NoSQL database company.

Article Tags

NoSQL

About Dave Rosenthal

View all posts by Dave Rosenthal

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Consistency and ACID: A new path for NoSQL

Article Tags

Subscribe to SDTimes

About Dave Rosenthal

Related Articles

Apache Cassandra 4.1 released with improved Lightweight Transaction performance

SD Times Open-Source Project of the Week: Speedb

SD Times news digest: Android for Cars App Library 1.1, MariaDB announces a technical preview of NoSQL listener capability, and Rezilion funding

SD Times Open-Source Project of the Week: Apache Drill