Number of NoSQL options grows

Published: March 25th, 2014

NoSQL databases continue to proliferate as the demand for Big Data solutions grows. While relational databases aren’t going away anytime soon, different data models require different types of solutions. As a result, several types of NoSQL databases have emerged, each with its own pros and cons.

“It’s important that you’re not just going with a traditional database because that’s what everyone else is using,” said Evaldo de Oliveira, business development director at FairCom. “Pay attention to what’s going on in the NoSQL world because there are some problems that SQL cannot handle.”

NoSQL databases have been gaining momentum because organizations want the ability to query unstructured and semi-structured data, and they want to take advantage of database technologies that were designed for the Web and Big Data. NoSQL solutions are generally open source, provide linear scalability across commodity hardware, and ensure high availability through distribution and replication. Many of them also store data in a schemaless manner.

The four major types of NoSQL databases are key-value stores, document stores, wide column stores, and graph stores. Some of them, particularly key-value stores, may be broken down further into subtypes depending on who is classifying them. It is also worth noting that some NoSQL databases span more than one category and some of them also support SQL queries.

For example, HPCC is an open-source computing platform originally developed and contributed to by LexisNexis Risk Solutions. It is programmable and supports different data models. LexisNexis Risk Solutions typically uses a row-oriented layout for its own data services, although HPCC also supports columnar and adjacency table (graph) layouts as well as multi-key retrieval, according to Flavio Villanustre, vice president of information security at LexisNexis Risk Solutions. HPCC also supports SQL queries.

Couchbase Server combines a key-value store with a JSON document store and caching. It is also a mobile JSON document store, said Rahim Yaseen, senior vice president of products at Couchbase. Couchbase Server combines elements of Memcached, the open-source, high-performance, distributed memory object caching system, with elements of the Apache CouchDB document store, along with proprietary enhancements.

Meanwhile, FairCom c-treeACE is a key-value store, but unlike most NoSQL databases that are non-transactional, c-treeACE is a fully consistent NoSQL transactional database that provides SQL capabilities on top of its NoSQL main core technologies. According to FairCom’s de Oliveira, c-treeACE combines the NoSQL benefits of high performance, low latency and precise data access control with the flexibility of SQL interfaces. The SQL support allows organizations to integrate with SQL environments such as reporting, BI and data warehouses.

Which NoSQL database or combination of databases organizations use varies, which has created a market for the expanding number of database types and hybrid implementations.

Pythian, a provider of data-management consulting and managed services, wisely takes a problem-first approach to identifying potential solutions. Its customers use document stores for user profiles and news articles, and key-value stores for counters and time series data.

“Most people who choose NoSQL as their primary data storage are trying to solve two main problems: scalability and simplifying the development process,” said Danil Zburivsky, solutions architect at Pythian. “Despite all the efforts, working with relational databases still feeds ‘awkward’ in most programming languages.”

#!Key-value stores
Key-value stores are the most basic type of NoSQL database. They are fast and highly scalable but have less built-in intelligence than wide column stores or document stores, for example. Key-value stores assign a key and a value to an item or a BLOB n a database, and retrieve it using a key.

“If I’m looking for a specific piece of information or want to search for a specific piece of information in a BLOB or all BLOBS, it’s not set up to do that,” said Kurt Cagle, principal evangelist for semantic technologies at Avalon Consulting. “If I’m interested in the entire thing and I can do it one key at a time, it’s great.”

Key-value stores are used in many industries for purposes ranging from gaming to payment authorizations.

“Key-value stores provide simple key-based storage and retrieval,” said LexisNexis’ Villanustre. “They tend to be fast and easy to distribute and parallelize, but they suffer drawbacks in two specific cases. If the key is not completely known and you have to use wildcards, or when the retrieval query needs to use the intersection of multiple keys to identify a candidate set, performance drops considerably because of the large size of the intermediate candidate sets.”

Document stores
Document stores store, retrieve and manage document-oriented information. They also use key-value pairs, but unlike key-value stores, they recognize objects (usually JSON objects) as documents and therefore do not require the document to be translated. However, the additional overhead taxes processing speed as compared to key-value stores.

“The JSON document model is a lot more in tune with where the Internet is,” said Couchbase’s Yaseen. “It came from JavaScript and is used not just to store data but for a lot of the processing done on the Internet (which is done in JSON documents).”

Madison Logic, which provides data and online lead-generation services, uses a document database to help model the intent of companies.

“We have complex models that contain varying amounts of information and varying types,” said Madison Logic’s CTO Mark Hershberg. “Because we’re always investigating new data sources to improve our models, we can’t be certain what data fields may be part of the model 18 months from now. Traditional databases don’t have [that] flexibility or in many cases the ability to scale to handle the models that we need. While column-based and document-based NoSQL databases made the most sense, we felt the document model provided the right amount of flexibility.”

Wide column stores
Wide column stores (a.k.a. column stores or columnar stores) are optimized for queries that span multiple columns of data (super columns) or multiple columns of related data (column families). Like relational databases, wide column stores organize data in rows and columns; however, the column orientation is better suited to certain types of queries. Like document stores, column stores share attributes with key-value stores.

“Wide column stores are particularly good at reducing the amount of disk seeks required for retrieving data,” said LexisNexis’ Villanustre. “Unfortunately, they are not very efficient when most of the columns are required as part of the candidate set.”

Wide column stores are better at handling sparse data than relational databases. For example, if a user profile contains a first name, last name and interests, some users may choose not to specify their interests. A relational database would require “no value” to be specified and the processing logic to handle that; a column store just accommodates the variance.

“One of the key advantages of columnar stores is the way you can store the information,” said Avalon Consulting’s Cagle. “You can compact information and deal with more sparse content. You gain flexibility, but it comes at a cost of less performance, and that’s the big trade-off with all of these [NoSQL databases]. There are very few that offer high performance, high scalability and high flexibility.”

Graph stores
Graph stores are used to explain relationships. Some graph stores use adjacency nodes where one node points to another; triple stores store graphs as subject-property-object. Graph stores are commonly used for network diagrams and social graphs.

“Graph databases tend to be optimized for referentiality. The problem is they’re a lot like assembly language once you start defining everything,” said Cagle. “A triple store is a way of representing an assertion so you can essentially build into these structures a data model between classes of things rather than just between instances.”

Wargaming.net is currently evaluating graph databases with the goal of understanding all of the relationships in its games.
“Who you play with is very important in online games,” said Craig Fryar, head of global business intelligence and the global business intelligence data engineering team at Wargaming.net. “Using [the graph database], we define relationships between players in the different ways they can interact with each other from platoon and clan membership, to chatting with and shooting at each other.”

Wargaming.net uses a combination of NoSQL database types, as does Zephyr Health. The company uses a document database, a graph database, a cache, a relational database, and a NoSQL database service to accommodate its wide range of use cases.

“We use a variety of NoSQL databases in a polyglot persistence model,” said Brian Roy, director of products at Zephyr Health. “We firmly believe that the primary lesson to be learned from the NoSQL movement is that no one database is appropriate across a wide range of use cases. Ultimately, this was the fatal flaw in the relational-database-for-everything approach.”

FairCom’s de Oliveira agreed, underscoring today’s need for performance.

“If you’re at the point where the service level is critical, don’t rely on complex SQL queries,” he said. “You have options available to handle it differently.”

The vendors featured in this Buyers Guide are available here. You can also find additional solutions analyzed here.

Article Tags

Big Data, Couchbase, FairCom, LexisNexis, Madison Logic, NoSQL, Pythian

About Lisa Morgan

Lisa Morgan is a contributing editor to SD Times.

View all posts by Lisa Morgan

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Number of NoSQL options grows

Article Tags

Subscribe to SDTimes

About Lisa Morgan

Related Articles

Data is the new petroleum; companies need better pipelines — and better oil-spill clean-up methods

LexisNexis unveils new API for accessing generative AI-approved licensed news content, corporate data

March 2024: People on the Move

Couchbase adds vector search to database platform