Tomorrow morning, Cassandra Summit kicks off in San Jose. The event highlights the growth in popularity of the Apache Cassandra NoSQL project, as well as the expansion of Valley-darling database company DataStax.
We caught up with DataStax cofounder and CTO Jonathan Ellis to chat about his recent decision to step down from the chairmanship of the Apache Cassandra Project. After five years working so closely with the open-source project and DataStax’s eponymous enterprise offering, we thought a debriefing was in order.
What are you looking forward to seeing and hearing about at Cassandra Summit?
I’m looking forward to an update from our evangelism team on data modeling and what we’ve learned over the past year seeing what the new features of Cassandra 3.0 make possible— in particular materialized views, which we launched late last fall. That really takes a lot of the headache away. You used to have to spend a lot of time manually de-normalizing tables to fit your query patterns, and materialized views automates that at the server level.
I think one of the big themes is that Apache Cassandra 3.0 has been out for about nine months, and so we’re starting to get some experience with that from the field, if you will, and we’re getting feedback on that into the project.
What’s new with DataStax Enterprise?
We had our 5.0 release in June, our multi-model support, and for the first time we added graph support on top of Cassandra, and analytics, and search.
We included Cassandra support for JSON interoperability. You can take a JSON document, and Cassandra will automatically turn it into tables, and rows, and user-defined types that are nested.
Cassandra is loosely based on a tables-and-rows model. But, unlike a classic relational row that just has primitive columns inside it, a Cassandra row can hold collections like maps and lists that contain entries that, themselves, contain other nested entries. You can actually take an arbitrarily nested JSON document and map it into a Cassandra row. That came with DataStax Enterprise 5.0.
Now that you’ve left the project management committee for Apache Cassandra, will you still be contributing to the open source project anymore?
Honestly, the last big feature I wrote for Apache Cassandra was lightweight transactions back in version 2.0. My role has been turning into an executive and managerial one inside DataStax.
In terms of my involvement with Apache Cassandra, I’m still involved at an architectural level and am consulting on things like how’s the best way to build this at a high level, what’s the API going to look like, and how’s it going to work under the hood. I’ll still be involved at that level as a member of the community. But I’m going to be giving that administrative role of project chair to someone else and funnel that extra time into DataStax.
What would you like to see from Cassandra over the next two or three years?
I’ve always said I have a pretty foggy crystal ball. I have a pretty good idea of what I’d like to see in the next six months. In the next three years… a lot can change in that timeframe.
Right now, Cassandra is very well suited for individual applications. But it’s starting to move into enterprises where those companies would like to centralize their Cassandra management functions. Rather than have eight different Cassandra clusters for eight different Cassandra applications—and sometimes that’s the best choice if these applications have different enough specifications that you need to tune each one separately—but often these enterprises would like to manage a single Cassandra cluster and provide that as a resource. So I wonder if multi-tenancy might be the next horizon for Cassandra to tackle.
We chatted when you started Riptano (which changed its name to DataStax in 2010). In all that time since then, what is your favorite memory from building out Cassandra?
I think my favorite thing has been watching Cassandra grow from an engineering effort that implemented ideas that came from other places. At the beginning, it was an amalgam of Google’s Bigtable and Amazon’s Dynamo—these research papers that those companies had put out, growing from that and doing innovative technologies that were the first in the industry. Some of the ones that come to mind are lightweight transactions. We did the work on time-series, workload-specific compaction optimizations. Those are things people could write research papers and doctoral-level papers about. We’ve gone from imitators to innovators.