While many companies still utilize relational databases, the benefits of NoSQL databases are clear, whether that’s the ability to handle large volumes of structured or unstructured data, perform Agile sprints or its flexibility and scalability. And moving to a NoSQL database is easier than most think.
Brian Hess, the strategic solution engineer at DataStax, explained that SQL is not really the problem to begin with. It’s very straightforward to interact with and it is very common for data to be nature. In fact, most NoSQL databases actually have a SQL-like query language.
Instead, the problem is that relational databases struggle with scalability.
“When relational databases came out and really put some structure and formalism around these systems, they really made some promises about what they’re offering to applications and data owners that struggle when you get to scale,” Hess said. “And this is fine, when the data was measured in in kilobytes, megabytes, even gigabytes. But as we get into bigger datasets, we really have problems.”
To ameliorate the problem of scalability, Apache Cassandra, was created in 2008 as a free and open-source, distributed, wide column store (a NoSQL database that uses tables, rows, and columns) database management system. It graduated the Apache Incubator two years later. Cassandra also formed the basis of DataStax’s database for hybrid and multi-cloud.
Now, the Cassandra community is focusing its efforts to revisit consistency and isolation.
“With a lot of the systems that we see out there, the applications that we’re working with, 100% consistency is not something most applications really need,” Hess said. “And so that’s an area where we’re going to really relax.”
Another area for improvement is isolation, which prevents different queries from interacting with each other and getting in the way. There is currently a very strict way of going about this; however, a less rigid approach would allow for more concurrent users and queries going on in the system at the same time, according to Hess.
Cassandra is designed for high concurrency, lots of queries coming in at the same time, low latency, so very short millisecond or even sometimes faster responses, according to Hess.
“A lot of times, people will talk about how we’re going to do this very differently. We have to get you to think entirely differently when talking about Cassandra versus a SQL database. And I think that’s not quite right,” Hess said. “It’s not that you’re actually doing different things. You’re doing very similar things, but you’re doing them by approaching them slightly differently.”
Cassandra still has rows and tables and is referred to as a partitioned-table data model, even though it is still referred to as a wide-column store database by many.The tables then live inside a key space, which is basically a holdover from the former model that used a schema in the relational database.
“It’s really just the language that’s different and not so much the concept,” Hess explained, adding that the Cassandra Query Language (CQL) also looks an awful lot like the relational SQL. “We really solidified around CQL and this tabular model. The rows and columns are really, really natural. And by having a schema, we can ensure that as the data’s being put in, it’s actually valid.”
Many NoSQL models share similar concepts with RDBMS, whether it’s the SQL-like languages, keyspaces that are analogous to a database in the RDBMS world, or the use of tables, which are similar to RDBMS tables but more flexible and dynamic.