Springtime for databases

Published: March 1st, 2013

- Alex Handy

Oracle. SQL Server. DB2. MySQL. PostgreSQL. Despite the fact that these are the five most popular enterprise databases, they’ve become a bit dull over the past three years. Since the NoSQL movement began in 2010, new data stores have offered such a diverse array of use cases, it would seem that almost any traditional database could now be replaced by some specialized data store.

But despite the NoSQL revolution being the cause of this new springtime for databases, not all the green shoots are NoSQL. There are new databases cropping up, or just now maturing, in all manner of technical areas. There are new graph databases, new time-series databases, highly expandable key-value stores, and even new takes on the relational model.

Like most tools in any type of job, using the right database in the right place can make the difference between success and failure. That’s why choosing a database has gone from being one of the easiest decisions your team has to make to one of the hardest.

So, then, we set out on a trip through this verdant and growing meadow of data stores. Which one is right for you? That depends entirely on your use case.

SQL revolution
NoSQL can mean two things: No SQL, or Not Only SQL. It is the latter that many NoSQL companies tout when offering their data stores as a supplement to existing relational databases. But just because you need fast response times and highly scalable transactions doesn’t mean you have to throw SQL out entirely.

Still, the challenges, for both new and old relational database players alike, are to focus on the strengths of your data store, and to make sure developers understand what the best use case for your software is.

Scott Jarr, cofounder and chief strategy officer of VoltDB, said that his company has found the sweet spot for its SQL-based relational data store that focuses on Java-based stored procedures. “I think that we are in a state of incredible noise and confusion in the market, and part of that is a natural stage of a market that is in its early stage and growing fast,” he said. “People are no longer looking at the individual products. Instead, they’re saying ‘I’ve got a particular problem,’ and then they’re starting to look at the databases that are options to them.

“Our challenge has been identifying what that use case is, and figuring out how we talk about it. It’s been quite clear to us as we’ve accelerated that our use case is very simple: It’s high-velocity in-bound transactions, like stocks, or the Web. It’s about making decisions on them in real time, and they’re looking at real-time analytics on it. That was news to us two or three years ago. The real-time analytic component became a very unique third leg to that stool. Being able to look at the analytics in real time is very important.”
#!
Naturally, SQL databases don’t have to live in a vacuum. The moniker “Not Only SQL” is aptly applied in the enterprise, where massive databases aren’t going to be replaced overnight by some hot new NoSQL. But having giant data stores already in place doesn’t limit one’s access to NoSQL technologies. Using NoSQLs as a front for existing databases is a great way to spread the benefits of NoSQL without losing the relational benefits on the back end.

Even Oracle is getting into the spirit of the NoSQL revolution by including the Memcached API in its latest release, MySQL 5.6.

Tomas Ulin, vice president of MySQL engineering at Oracle, said the addition of the Memcached API will enable developers to more easily manage and fill their caching layer directly from the database.

“We announced a year ago that we were adding the Memcached API to access and update data in InnoDB,” he said. “We can join the best of both worlds in the SQL/NoSQL discussion. We think we can make it much easier to gain the benefits on NoSQL-type access.”

Time for time series
The cloudy term “NoSQL” has gobbled up a number of database types over the past few years, but when you get right down to it, there are a few types that, while truly not using SQL, don’t really belong under the umbrella term either. Graph databases and time series databases, for example, have both existed for some time, but it’s been the NoSQL revolution that’s brought them out into the open for discussion in the mainstream software development world.

They’re also relevant because of the many new use cases cropping up every day, thanks to social networks, mobile devices, and the need for scalable cloud-based data stores. In the cloud, many existing data-store ideas just don’t work out properly, and so the past few years have been a period of rebirth for these extant database types.

The newest among these new databases is Saturnalia DB. This time-series database was crafted by Jonathan Moore and Leif Ryge, both developers at Web-crawling company Spinn3r. The pair, while remaining at Spinn3r, have launched a new company, called StatMover, around a hosted version of Saturnalia.

“Different problems are better served by different data models,” said Moore. “If you have a particular problem in mind, you can get more performance and lower cost by having a database tailored to your data.”

Thus, as Moore and Ryge were building Spinn3r’s high-speed Web-crawling and storage system, they quickly found that they needed a solution to replace the traditional open-source logging data store RRDtool.

“RRDtool was written for a different age,” said Moore. “RRDtool does not scale in terms of I/O, and you have other limiting factors like how much data you want to keep when you create the database, and it starts dropping data quickly.

“We realized from our work at Spinn3r running a high-performance crawler, we needed an order of magnitude more info on the stack. The tools fell over eventually, and we had to decide what could we monitor. Not ‘What can we monitor?’ but rather ‘What do we want to monitor?’ “

StatMover thus offers a hosted version of Saturnalia and gives developers a place to forward all of their server and application stats. When problems arise, they can typically be spotted through graphical analysis of the statistics and logs gathered. And that’s just what StatMover offers as a SaaS product.
#!
Graphic wave
Another type of database that’s riding the NoSQL wave to success is the graph database. While graph databases have found an immediate niche inside social networking-related applications, Emil Eifrem, CEO of Neo Technology, said that their usefulness will be seen far beyond Facebook. He takes a different view of how databases are differentiated.

“I always tend to take the data-model view to this explosion of databases,” he said. “What are the abstractions, the building blocks exposed to programmers? Graph has three core abstractions: nodes, typed relations between nodes, and key-value pairs attached to both nodes and attachments. That final point is incredibly important and very powerful. Graph is the first model that fundamentally embraces how relations are modeled as a first-class citizen.”

That means that not only can individual objects or data items be referenced by a key-value pair, but all of that item’s connections to other items and objects can be called out by a key value as well. That means the relationships between data are modeled in a quickly accessible storage model.

Eifrem is also bullish on graph database performance due to this first-class citizenship of relations. Rather than performing multiple sub-queries into the data and waiting for relations to pan out in the query, graph databases allow developers to quickly sort data on the fly without being forced to lay out the database schema according to unknown future data-organization requirements.

That means info stored in a graph database can quickly be laid out by date, size, type, owner and any other sub-category, without those individual items being called out into their own tables or indexed in their own schema.

“We talk about light-board friendliness. If you are able to sketch out your domain on a whiteboard, translating that into a data model in the database is typically taking that whiteboard and everything you’ve drawn as hub is a node, and every arrow is a relationship,” said Eifrem.
#!
NoSQL, three years on
It seems odd to call a handful of three-year-old database projects and software companies the old guard, but in this rapidly changing database landscape, tools like Apache Cassandra, Apache CouchDB, MongoDB and Riak are all now the entrenched players.

That’s not to say that these existing NoSQL databases are standing still. The past 12 months have seen extensive maturation for all four of these popular data stores.

Robin Schumacher, vice president of products at DataStax (the company behind Apache Cassandra), said that much of the company’s recent work on the platform has been focused on filling in the gaps for enterprise users.

The company released Enterprise Edition 3.0 of Cassandra and its supporting tools in late January. This version adds a number of enterprise features, such as internal authentication, in-transit SSL encryption, and the ability to do more granular data restores from backup.

Schumacher said that “Cassandra was faulted for being more complex to configure and use” in the past, and that most of the updates DataStax is working on are targeted at remedying this critique.

Elsewhere, the defenestration of Apache CouchDB through the corporate windows of companies like Cloudant and Membase has yielded some interesting results for it.

First, the database’s creator, Damien Katz, left the project in order to focus on rewriting many of the underlying ideas in CouchDB in C for Membase. Then, the Cloudant team decided to build a new solution for its hosted database product, one that only used a portion of the CouchDB code.

Mike Miller, cofounder and chief scientist of Cloudant, doesn’t think these two decisions will hurt Apache CouchDB in the long run, and he pointed out that Cloudant still contributes about a third of the Apache CouchDB code overall.

“We’ve chosen to layer on the CouchDB API [on top of our service],” said Miller. “I think Apache CouchDB is going to have a long and fruitful history. I think it’s the PostgreSQL of NoSQL. In contrast to everything, it doesn’t have a vendor behind it. It was a pure open-source project.

“The things we love about it is the API. The API is very clean. The basic things a database does, like a key-value store, those map perfectly onto a clean REST endpoint: You can get, put, post, delete. That’s something Apache CouchDB got right. We also like their model of not focusing. You don’t do anything except pure HTTP. That has advantages to developers, to give them pure HTTP, especially for mobile developers, because you don’t need any middleware to talk to the database itself, which allows you to do incredible things,” said Miller. Incredible things like simplifying architectures, and keeping applications in direct contact with databases, he said.

Elliott Cordo, principal consultant at Caserta Concepts, has to do incredible things every day. His consulting company works with enterprises, typically on big data-warehouse analytics projects. He said that he’s seen a lot of great innovation in databases over the past few years, but what he wants in the future is more flexibility from those data stores.

“I think we’ll see stores like Apache Hadoop’s HBase mature and become more of a front-end system,” he said. “We’ll see more high-level analytics languages evolve. These databases will also have built-in functionality of aggregation, rather than just being architected for queries.

“In the analytic world, you need them to be a little more general-purpose. We’re going to see more in-memory databases, like Memcached. And we’ll see memory-based OLAP solutions, integrating and working with these platforms.”

Precognition
While data stores are becoming more diverse and interesting to developers, the primary reason for this whole kerfuffle over data has come from the need for analytics, in real time or otherwise. One company that is taking on the analytics side of the problem is Precog, a SaaS-based solution designed to give developers and analysts easy access to predictive analytic tools and techniques.

John De Goes, founder, CEO and CTO of Precog, said that Precog is able to perform deep analytics, and that developers can create these analytics quickly from within their browser.

“We focus on persistent data, interactions and state changes,” he said. “We provide APIs and client libraries and database synchronizers, but primarily APIs to allow developers to capture this data on a mobile device, on an application, on some sort of sensor-based device. We let them trivially capture it with a few lines of code. They can store any kind of semi-structured data, store nested JSON they got off of Twitter, as well as modify schema over time instead of modifying the database. Then we provide integration. We take that stream of persistent data coming in and augment it, cross-reference it with your database of customers, and add info to it. Our technology allows us to do a pre-join on that to accelerate the process of analytics on the fly.”

Precog is built on new technology constructed by De Goes and his team. “You can think of it as a time-series database, but I would characterize it as a statistical database,” he said. “It also includes measured data. It enables you to do in-database statistics without having to stream all that back to the client. That doesn’t work with Big Data. We enable you to bring those computations into the database and execute those really fast.”

In-Memory Influx
With all the furor over Apache Hadoop in the enterprise, running analytics on Big Data is a hot topic. ScaleOut Software is addressing this market with its own scalable in-memory data store that can run analytics at a much faster pace than Hadoop.

William Bain, CEO of ScaleOut Software, said that developers can find great time savings with the workflow insinuated by an in-memory data grid for use in analysis. These savings come not only from the real-time nature of the software, but also from the fact that ScaleOut does not require data ingress steps between query attempts.

“The workflow for our customers is that they’re using the in-memory data grid to hold data,” said Bain. “The data is naturally changing rapidly on a daily basis. This is different from Hadoop. The data is changing while the analytics is going on. For a financial trading system or an airline reservation system, they cannot wait for data to arrive. ScaleOut allows you to perform MapReduce while the data is changing.

“The data being stored in the in-memory data grid is naturally object-oriented, and it fits into the object collection model. It can be selected for analysis based on query properties. Instead of record readers, you do a parallel read through the grid.”

Article Tags

Big Data, databases, MySQL

About Alex Handy

Alex Handy is the Senior Editor of Software Development Times.

View all posts by Alex Handy

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Springtime for databases

Article Tags

Subscribe to SDTimes

About Alex Handy

Related Articles

MySQL community calls for Oracle to establish a foundation to ensure project’s future

Data is the new petroleum; companies need better pipelines — and better oil-spill clean-up methods

Canonical announces general availability of Charmed MLFlow

IBM launches guide for contributing to open source cloud projects