As Big Data has gotten bigger and bigger, and businesses demand more and more out of their data, traditional database structures just don’t cut it anymore. The traditional single static repository simply isn’t equipped to handle the industry’s rapidly evolving needs.
Cory Isaacson, database technology veteran and the CEO of agile Big Data technology provider CodeFutures, believes we need to rethink the role of databases in a cloud and mobile-dominated landscape. He has worked with database technologies for 25 years, from the early days of Sybase to MySQL and SQL, in-memory databases, and more recently open-source database projects such as MapDB. An early startup of Isaacson’s built some of the first big client-server applications for the entertainment industry in the early 1980s, and in the decades since he has started and sold several consulting companies, and spent several years heading up Rogue Wave before starting CodeFutures in 2007.
SD Times spoke with Isaacson ahead of his upcoming talk, “Scaling and Managing Big Data: Have We Been Looking at Databases Wrong This Whole Time?” about how databases have changed, scaling in the cloud, and why “agile Big Data” is the future.
SD Times: How would you describe the traditional view of databases?
Cory Isaacson: People look at databases as a static repository. You develop a schema you think will fit your needs as best you can, you start developing against it, and invariably you write, read and start manipulating the data. You don’t really think of it as a dynamic. Then what happens is that, very quickly, application requirements change and evolve. You have to start scaling the database and altering the schemas as best you can, usually sticking with what you have as close as possible, but that’s almost always very impractical.
So what happens is you run into an incredible number of performance problems and what I’d call application integration difficulty. The requirements fit less and less to that traditional model and need to expand more and more into completely different and new capabilities. Over time, it just gets messier and messier, it makes the application developer’s job harder and harder, and it makes performance more and more challenging as the application grows.
How has your view of databases and Big Data evolved over time? What do organizations need out of data now that they didn’t necessarily need in the past?
There’s quite a bit that has changed. I’ll start with scaling, which is of big interest to everyone. The way you scale a database is to partition it across a number of servers. It’s the only practical way to do it. While there are many ways to do that, they all come down to sharding in one capacity or another.
Sharding comes from broken glass, a metaphor popularized by Google with its BigTable architecture. The simple idea of sharding is you’re going to use a key in the data to divvy it up. With a NoSQL database, it’s a no-brainer. The database itself doesn’t know anything about your content, it just knows about the key itself, so it’s very easy to do. But when you have related data—which is true almost anywhere—as soon as you shard one way, it works well for one use case but not for another.
Let’s say you have a multi-user game with players competing against each other. You want to show players a list of all the games they played and what their scores were. Every game will want that. Let’s say you grow to millions and millions of players and shard by player. Then what happens is now the players say they would like to see a list of who else played a given game they’ve clicked on. The data is partitioned completely wrong for that, so the only way you can get that answer is to search all the partitions, which is the worst-performing thing you can do.