Choosing the right database: Understanding your options

Published: November 21st, 2013

Our world is awash in digital data. Continuous advances in storage technologies, data-collection methods and processing power enable our businesses to capture, aggregate and analyze information from an ever-greater spectrum of sources. The opportunities are endless, but with these opportunities come daunting challenges.

On one hand, the options for assembling the ideal solution for a given data application have never been better; on the other hand, choosing the right components for that solution is ever more confusing and complex.

When collecting requirements, concentrate on what is needed over how the need will be addressed. Focusing on the need rather than creating a requirement for a specific implementation opens the door for other methods that may provide a more optimal solution, particularly if you are forced to trade off between the strengths and weaknesses of various options. With more than 200 choices available, it’s likely that you’ll be able to find one or more databases that are a good fit for your needs. However, given their breadth of features and characteristics, identifying the best match can be complex. Consider the following attributes when comparing databases.

Type of database
From relational database management systems (RDBMS) through specialized graphics, object and columnar databases, many types of databases have evolved to meet a wide range of common and specialized applications. For most applications, the two most important categories are:

SQL RDBMS: Relational databases, the most widely used model, maintain data in a set of separate, related files (tables), and combine elements from these files when needed for queries and reports. Designed primarily to handle structured data, these databases are heavily used for workhorse business applications in areas such as finance, manufacturing and human resources, which use SQL queries to access data maintained in standard record formats. Popular SQL RDBMS examples include PostgreSQL (open source), Oracle, MySQL (also Oracle), SQL Server (Microsoft) and DB2 (IBM).

NoSQL: These databases provide a solution for handling data that is less structured than required for relational databases. Trading query processing flexibility for scalability and performance under different workloads, these databases focus on data storage and are optimized for retrieval and appending operations. These characteristics make NoSQL databases attractive for cloud deployment and Big Data applications. Popular NoSQL database examples (all open source) include MongoDB, CouchDB, Cassandra, Riak, Hadoop, Redis and Neo4j.
#!Platform support
All databases do not operate on all available platforms. Desktop (Windows, OS X), mobile (iOS, Android), enterprise (Linux, Illumos/OpenSolaris, BSD), and cloud (Amazon EC2, Joyent, GoGrid, Rackspace) operating platforms each have their own nuances and constraints that affect database cost, capacity and performance. Even when a given database covers multiple platforms, it may support some better than others.

Licensing terms
How a database is licensed can dramatically impact its cost, risk profile and usage flexibility. Commercially available databases charge licensing and maintenance costs for regular releases and stronger support, but can be costly to scale, restrictive in usage and require vendor resources if modifications or extensions are required. An increasing number of databases are being offered through open-source licensing. This model cuts licensing costs and usage restrictions, but requires a greater investment in internal and/or external support.

Maturity
With new databases reaching the market on a frequent basis, it is tempting to try the latest and greatest to get the newest features. Unfortunately, these databases often have more defects, security vulnerabilities and performance and stability issues than more-mature offerings. Maturity provides a larger user base and a longer time period to identify and remedy inherent issues and strengthen support. Unless newly offered features are essential for the business solution, a more mature offering is usually the best choice. Investigate characteristics such as length of time on the market, number of users, and level of support to compare the maturity of database offerings.

Operations and management
The cost of maintaining and operating a database over its life often exceeds its acquisition costs. Depending on its data center environment and the size, skill sets and strategy for its internal IT organization, a company can choose to implement and support a database with its own staff, or use one of many available sourcing options. Any aspect of the database’s selection, implementation, application software development, support and operations can be sourced to external consultants as a project or as part of a broader service.

Likewise, the database can be hosted by an external solution provider. Some hosting services include regular database backup and maintenance activities, as well as disaster recovery services that provide extra security and free internal IT staff to perform more strategic tasks. Cloud deployment is a variation of external hosting where the database operates on a cloud-computing platform offered by Amazon EC2, GoGrid, Rackspace or other providers. In this model, companies can use a virtual machine image to operate their databases independently, or they can simply purchase access to a database service delivered by the provider.
#!Operational attributes
Databases differ considerably in the robustness of their operational attributes. Attributes such as superfast performance and high availability come at a price in terms of software, hardware and support overhead. The true requirements of the business functions being supported are the key for finding the most effective balance between these attributes and their cost.

Performance: Database performance is judged by factors such as response time per query and transaction throughput, and is affected by the design of the database, infrastructure capacity, and the logical and physical structure of the data. For example, using a schema imposes more structure on the underlying data but provides better performance due to the constraints.

Scalability: Scalability evaluates the ability of a database to handle anticipated growth in usage, transaction rates and data volume. Considerations include inherent product limitations, ease of increasing capacity by adding hardware, and costs for additional licenses and hardware to handle expected growth.

Reliability: Reliability is a measure of the ability of a system to function correctly—including avoiding data loss and corruption—over a given time period. It is often assessed using Mean Time Between Failure and is enhanced by database features that help to avoid, detect and repair hardware- and software-driven data faults.

Availability: Availability measures whether a database is available for use when needed. A database designed for high availability would disable a malfunctioning component and continue operating at a reduced capacity, as opposed to a less capable system that might crash and become totally non-operational.

Resiliency: This factor assesses a database’s ability to recover from faults such as hardware crashes, overloads and software component failures. Simple databases may require restoration from backup files after a failure, whereas a more advanced database may employ sophisticated replication, rollback and component isolation strategies to permit continued operation and minimize the impact of faults.
#!Security: Databases offer differing levels of security features to protect the database from security vulnerabilities such as unauthorized access or disclosure of personal or proprietary data; damage to the data or programs; interruption or denial of authorized access to the database; attacks on other systems; and the unanticipated failure of database services. The level of security protection required for a given database application depends on the confidentiality/value of the data it contains and its level of accessibility to unauthorized activities.

Data integrity: A database’s data integrity features enable the specification and enforcement of rules to protect the quality of the data it stores. These rules may include checks and correction for invalid data upon entry, restrictions on data relationships, and timeframes for data retention.Ease of implementation: Choosing the database with the best features will not necessarily guarantee a successful implementation. The availability of tools, staffing and support for a widely used database can trump the technical advantages of a less common offering.

Availability of tools: A mature, widely used database will be supported by a variety of open-source and commercial tools to assist with software development, integration with other products, backup and recovery, and other critical functions. These tools increase productivity and cut time and costs throughout implementation and operation.

Extensibility: Does the database support simple ways of adding functionality outside of its standard feature set? Does your company have access to the source code, the skills and ability to make modifications? Does the database have interfaces to support externally developed functionality, or can it take advantage of the capabilities of other software applications? For example, an otherwise ideal database may not support the data encryption your company needs itself, but if it can be run using an encrypted file system, the combination becomes a viable solution.

Knowledge and staffing: Is the new database familiar to your internal IT staff? If not, it will require an investment for training or the hiring of external resources. For example, while many enterprise programmers know how to code effective SQL queries, far fewer would be able to write Map/Reduce jobs for Riak. Likewise, the market availability of experts is far greater for widely used databases such as Postgres and Hadoop.

Community: Open-source offerings, such as PostgreSQL, have active and robust user communities that provide support, and they make getting use cases and product extensions easier. In contrast, gaining the same level of assistance for closed-source offerings such as Oracle and Vertica requires expensive contracting for proprietary resources.

Successfully implementing a new database involves considerations well beyond the selection of the database itself. If your company doesn’t already have the right expertise in house, engaging specialized consulting assistance can be highly valuable. By bringing market knowledge and first-hand experience of the many available database solutions, consultants can reduce the time, resources and costs of choosing, implementing, operating and managing your database solution.

Denish Patel is the lead database architect at OmniTI, a global IT services company.

Article Tags

Big Data, databases, NoSQL, SQL

About Denish Patel

View all posts by Denish Patel

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Choosing the right database: Understanding your options

Article Tags

Subscribe to SDTimes

About Denish Patel

Related Articles

Snowflake introduces agentic AI innovations for data insights

Amazon launches serverless distributed SQL database, Aurora DSQL

Data is the new petroleum; companies need better pipelines — and better oil-spill clean-up methods

PostgreSQL 17 adds performance gains, storage optimizations and more