Our world is awash in digital data. Continuous advances in storage technologies, data-collection methods and processing power enable our businesses to capture, aggregate and analyze information from an ever-greater spectrum of sources. The opportunities are endless, but with these opportunities come daunting challenges.
On one hand, the options for assembling the ideal solution for a given data application have never been better; on the other hand, choosing the right components for that solution is ever more confusing and complex.
When collecting requirements, concentrate on what is needed over how the need will be addressed. Focusing on the need rather than creating a requirement for a specific implementation opens the door for other methods that may provide a more optimal solution, particularly if you are forced to trade off between the strengths and weaknesses of various options. With more than 200 choices available, it’s likely that you’ll be able to find one or more databases that are a good fit for your needs. However, given their breadth of features and characteristics, identifying the best match can be complex. Consider the following attributes when comparing databases.
Type of database
From relational database management systems (RDBMS) through specialized graphics, object and columnar databases, many types of databases have evolved to meet a wide range of common and specialized applications. For most applications, the two most important categories are:
SQL RDBMS: Relational databases, the most widely used model, maintain data in a set of separate, related files (tables), and combine elements from these files when needed for queries and reports. Designed primarily to handle structured data, these databases are heavily used for workhorse business applications in areas such as finance, manufacturing and human resources, which use SQL queries to access data maintained in standard record formats. Popular SQL RDBMS examples include PostgreSQL (open source), Oracle, MySQL (also Oracle), SQL Server (Microsoft) and DB2 (IBM).
NoSQL: These databases provide a solution for handling data that is less structured than required for relational databases. Trading query processing flexibility for scalability and performance under different workloads, these databases focus on data storage and are optimized for retrieval and appending operations. These characteristics make NoSQL databases attractive for cloud deployment and Big Data applications. Popular NoSQL database examples (all open source) include MongoDB, CouchDB, Cassandra, Riak, Hadoop, Redis and Neo4j.
All databases do not operate on all available platforms. Desktop (Windows, OS X), mobile (iOS, Android), enterprise (Linux, Illumos/OpenSolaris, BSD), and cloud (Amazon EC2, Joyent, GoGrid, Rackspace) operating platforms each have their own nuances and constraints that affect database cost, capacity and performance. Even when a given database covers multiple platforms, it may support some better than others.
How a database is licensed can dramatically impact its cost, risk profile and usage flexibility. Commercially available databases charge licensing and maintenance costs for regular releases and stronger support, but can be costly to scale, restrictive in usage and require vendor resources if modifications or extensions are required. An increasing number of databases are being offered through open-source licensing. This model cuts licensing costs and usage restrictions, but requires a greater investment in internal and/or external support.
With new databases reaching the market on a frequent basis, it is tempting to try the latest and greatest to get the newest features. Unfortunately, these databases often have more defects, security vulnerabilities and performance and stability issues than more-mature offerings. Maturity provides a larger user base and a longer time period to identify and remedy inherent issues and strengthen support. Unless newly offered features are essential for the business solution, a more mature offering is usually the best choice. Investigate characteristics such as length of time on the market, number of users, and level of support to compare the maturity of database offerings.