If you've given up on relational databases for the cloud... You're nuts!

Published: July 13th, 2011

- Alex Handy

The move to the cloud has brought many changes to software development, but few shifts have been as radical as those occurring in the database market right now. So different is the cloud for software architects that the first response from developers was to build entirely new databases to solve these new problems.Thus, 2010 was the year of the NoSQL database. But as time has moved on and NoSQLs have become more mature, developers are figuring out that the old relational ways of doing things shouldn’t be thrown out with the metaphorical bathwater.While relational databases were considered old world just a year ago, a new crop of options for in-cloud development has brought them racing back to the forefront. A combination of new relational databases aimed at the cloud, coupled with more mature in-cloud relational offerings, such as Microsoft’s SQL Azure database and Amazon’s SimpleDB, have presented some compelling reasons to ditch the new-fangled NoSQLs.

Amazon’s SimpleDB, for example, is a cloud-based relational data storage system focused on simplicity, as the name implies. Rather than cram caching, transformations and compromise solutions to the CAP problem (Consistency, Availability, Partitions: You can only choose two) into a new-world database, SimpleDB eschews futuristic ideas in favor of a clean, easy-to-use data store that can form the backbone of scalable applications while providing the 20% of functionality needed by 80% of users.

Adam Selipsky, vice president of product management and developer relations for Amazon Web Services, said that SimpleDB is about choice and ease of use. “Running a relational database, irrespective of where you do it, takes a certain amount of work and administration,” he said.

“There are a lot of use cases where people don’t need that full functionality of a relational database. SimpleDB is really meant to be the Swiss Army knife of databases. You’re not going to do joins, you’re not going to do complex math procedures. If you want to do data indexing and querying, then it can take all the scaling hassles away from you, and you don’t have to worry about schemas.”

Selipsky said that SimpleDB evolved out of the needs Amazon saw in its users. Databases were a hassle to maintain within the cloud, and yet most developers were only using a fraction of the functionality of those databases, he said.

“Where we started was to provide all the building blocks,” he said. “There’s an incredible variety of needs in our customer base. We have all the separable fundamental Web services with fundamental calls. That’s been one of the principal reasons these services have been so popular. We do think it’s important to make our services easier to use. There are a lot of examples where we will start to make the services easier to use.”

Look before you leap
Stephen O’Grady, analyst with research firm RedMonk, said that every in-cloud database offering is different, and that developers need to know what they’re getting into before they design an architecture around those data stores. He pointed out that Microsoft’s initial offering of SQL Server for Azure did not meet developer expectations, and thus caused some strife for both users and Microsoft itself.

“When Azure came out, the data store was basically a hierarchical database, not a relational one,” he said. “It was not your typical SQL Server. A lot of the initial users chafed at that requirement, and as a result the subsequent iteration [of the software from Microsoft was a] re-badged SQL Server. That’s one of the data stores available on Azure now and it looks a lot like your regular relational now.”

Microsoft decided to move closer to the old-world model of a relational database for its cloud-based offerings. Google, on the other hand, has made some very specific choices in its AppEngine offering, choices that dramatically impact the developer.

As O’Grady explained, “With Google AppEngine, their implementation of Big Table is a unique design. If you’re in the cloud, it really depends on what data store you’re using and what the properties of that data stores are. If you’re dealing with a relational, it will look and act like a relational. If it’s not, you have to adjust your application design to take that into account.

“That’s one of the potential throttles, because a lot of applications depend on a relational database, so moving to AppEngine would require a lot of work and porting. You have to be very aware of what you’re designing and developing too. If you’re designing for AppEngine, you can’t take that application and natively deploy it because you don’t have access to the database.”

EnterpriseDB is the company behind Postgres, an alternative open-source database. Robin Schumacher, director of product strategy at EnterpriseDB, said there’s really one great promise of a cloud database for developers. “The whole idea is to not really have to make changes to your application for it to benefit from a cloud database. That’s what you’re gunning for. There’s definitely a difference between putting up an Oracle instance in the cloud, and having a cloud database that meets the definition of what a cloud database, like SQL Azure or SimpleDB, is supposed to do,” he said.

DBs combined
It is in this capacity that many cloud database systems can be used to complement each other. The common model right now is to augment a relational database such as MySQL or SimpleDB with a NoSQL or caching layer to speed up the access to information. Such system designs began to pop up with the rise of the cloud, and Memcached was a frequently used caching layer for all manner of databases.

Today, however, there are entire classes of databases that include their own caching systems. Damien Katz began writing CouchDB as a way to solve data storage scalability issues, but earlier this year the company he formed to shepherd that database merged with Membase, an advanced form of Memcached.

Together, the two projects have merged to form Couchbase, a company focused on both the scalable back end and the RAM-stored front end of their database. “It was after I had the initial versions of CouchDB written in C++ that I really started thinking about scalability,” said Katz.

“I had a lot of experience with conventional concurrency, with threads and locks. I heard about Erlang. I decided to check it out, play with it for a week, and after a week I knew I could write everything in it. So I threw away my old code and rewrote everything in Erlang. It took me a month and a half to rewrite what I had written in C++ in six months.”

After CouchDB reached version 1.0, however, Katz and company decided there was something missing from their setup, and thus the merger with Membase. The merger effectively divided the database into a two-part system: hot storage and cold storage. Live data would be kept at the edges of the cluster, quickly available in RAM when needed, while the cold long-term storage of this data is left to the back-end CouchDB, a system that can store not only data but also other databases.

Elsewhere in the cloud, the Apache Cassandra project has been getting closer and closer to the Apache Hadoop project. In a manner similar to Couchbase, Cassandra can be used as a NoSQL database at the edges of a network, and thanks to recently released updates to the database, Hadoop can read in information directly out of Cassandra.

O’Grady said this multi-tier approach has been a best practice for some time now. What’s changed, he said, is that relational databases are not the solution to all problems anymore.

“That’s been a common pattern for a long time, at least in the sense of caching. A lot of high-scale properties have a relational database at the back end, supplemented by a caching mechanism, most commonly Memcached, as a front-end caching solution,” said O’Grady.

“That’s been a design pattern and a best practice for a while now. As far as the overall diversity, there’s no question that we’re in an era now where heterogeneity is the norm. As recently as three or four years ago, if you had a persistence problem, the solution was a relational database.

“That’s not true anymore. What is still true is that relational is a solution to a lot of problems, it’s just not the solution to all problems. Developers are beginning to realize that if I only want to store a key and a value, maybe I need a key value store, or if I am traversing graphs, maybe I need a graph database.”

It’s a bit of a thought shift, of course. Databases used to be singular towers of the truth, pillars of information consumed by all from a uniform and unique source. Today, however, the cloud offers many ways for developers to bring together various types of data stores. This is a major contrast to another trend in the market: master data management.

Faster than MDM
Master data management (MDM) practices stipulate that a single source be designated as the one true data source. In an MDM shop, one central database takes on all changes from the day’s (or hour’s) work and keeps the canonical record. It’s the one-server-to-rule-them-all approach.

But the move to the cloud has basically shoved this notion aside a great deal, said Brian Hopkins, principal analyst at Forrester Research. MDM, despite being something of a buzzword for the past few years, is essentially the opposite of a cloud database strategy. Instead of simple servers spreading data across nodes, MDM is about policies, constant ingres and ubiquitous validation.

Thus, Hopkins espouses a simpler approach to multiple databases, one that eschews MDM practices in favor of simpler strategies.

“MDM is a piece of it, but I hesitate to use that word,” he said. “We’ve had statistics where we see an average MDM implementation period takes 24 months, with 30 months to payback ROI. What I’m talking about is more along the lines of the data virtualization space.”

And indeed, “data virtualization” has become a new buzzword in and of itself. While cloud databases can offer simple data stores that can quickly scale, data virtualization in-cloud can bring the relevant information into the same RAM space as a running application. It’s about bringing the data to the application rather than the application to the data.

Starcounter is a new relational database company out of Sweden that will be launching later this summer. The company’s self-titled relational database offers data virtualization capabilities as well.

Peter Idestam-Almquist, CTO of Starcounter, said that his company plans to expand these data virtualization capabilities beyond C# and .NET, though this is the only environment currently supported by the database’s virtualization capabilities.

“Even if you run a database and you have your application code running and you want to access the data in the database, although they are running on the same machine, you transfer that data in RAM,” he said. “You transfer it from one part of RAM to the part of RAM that belongs to the application. You also transform the data from one format used by the DB system to what is used by the application system. By integrating these, you neither move nor transform the data. Instead of having a copy of the data, the application code has direct access to the data.”

It may seem odd that a startup would be focused on creating a new relational database, but there is something of a bumper crop of said companies thanks to the opportunities posed by the cloud. Traditional relational databases like MySQL and Microsoft’s SQL Server do work in the cloud, but they weren’t built for the cloud. Microsoft plans its SQL Azure as a next-generation, cloud-based SQL Server, but there are many others pushing new relational database software packages.

One such company is Citrusleaf, a startup focused on building a real-time transactional and relational database of the same name. Srini Srinivasan, CTO of Citrusleaf, said that his company’s database offers fast transactions and immediate consistency. Along the development path, he said he learned some interesting things about the CAP problem.

“Here’s the insight we’ve had in terms of CAP: When there is no failure, you can have consistency and availability,” he said. “When the partition happens, we can continue to let the two partitions work, and when they come back together, there would be a consistency issue. We have all the code available to detect the conflict, so most of the time our customers have chosen the ability to simply do the conflict resolution themselves.”

It’s about location
Srinivasan said that, beyond the database itself, the most important thing about cloud hosting is location, location, location. “The key thing to look for is collocation. If you’re a developer developing with a database, your access to the database has to happen really fast. Some of our customers have had this problem: They have an application in cloud, and it has to access databases outside cloud, so they found data centers where they were collocated,” he said.

EnterpriseDB’s Schumacher said that most of the desirability for a cloud database rests in its ability to alleviate pain for DBAs. “From a DBA perspective, you’re hoping a cloud takes away a lot of the pain you have to deal with,” he said.

“You want transparent database expansion and contraction, so you can automatically add nodes when demanded, and remove those nodes when demand decreases. Also important is load balancing across those nodes. From a developer perspective, you want to avoid those sharding situations where the application has to be aimed at a specific shard. The load balancers take care of those things, and you hopefully would not have downtime thanks to auto fail-over.”

Schumacher said that these and a few other necessities form the basic requirements for any cloud database. “Those are the key things. There are some minor things, like provisioning. Can you have a rolling upgrade? Do you have multi-tenancy capabilities? Is there a billing interface, so you can easily determine cost and usage?” he asked.

Beyond these capabilities, however, there are still major hurdles to overcome for all cloud databases. “A lot of the cloud databases can scale for reads, but can they scale for writes? Sometimes that’s difficult to pull off,” said Schumacher.

Karen Padir, vice president of products and marketing at EnterpriseDB, said that the company will be offering its own cloud database later this year, which may, perhaps, tackle the read/write disparity. “We’re building a cloud database for release in the late summer or early fall,” he said.

“We will host it in your private cloud, or in the public cloud. Right now, we’re looking at supporting Amazon Web Services, and we’re talking with Eucalyptus and Red Hat.”

Article Tags

cloud, databases

About Alex Handy

Alex Handy is the Senior Editor of Software Development Times.

View all posts by Alex Handy

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

If you’ve given up on relational databases for the cloud… You’re nuts!

Article Tags

Subscribe to SDTimes

About Alex Handy

Related Articles

Plotly brings vibe coding to visual data app development

Four trends reshaping Kubernetes platform engineering

CData Sync Cloud brings CData’s ETL/ELT tool to the cloud

DBOS announces FaaS platform DBOS Cloud and $8.5 million in seed funding