Data is more available than ever before, and copious amounts of new data are collected every day. But if there’s one major impediment to helping organizations unlock the full value of their data, it’s the fact that data hasn’t truly been democratized. In large part, data is simply not accessible for far too many professionals who need it.
In the years ahead, that needs to change with the confluence of cloud-native, open source, and automation technologies, along with a new collaborative approach that exists across organizations.
Recently, a panel of experts discussed this topic, which included Eric Brewer, Google Fellow and VP of infrastructure; Melody Meckfessel, CEO of Observable and former VP of engineering at Google; and Sam Ramji, chief of strategy at DataStax. It was a fascinating discussion about the coming decade of data and what it means for data-driven software development.
The power of data visualization
When it comes to software, Meckfessel is most interested in the human element of the equation.
Currently, she is focused on removing the toil from DevOps workflows and helping developers, and even non-developers, become more productive. She believes that data visualization will play a big role in the upcoming decade as devs move beyond static text-writing code to robust visual displays that can be leveraged by regular business users.
“I see a lot of power in visualization, bringing visualization to data, bringing visualization to software development, and empowering anyone who wants to create and interact with data in powerful ways,” Meckfessel said. “Visualization taps into our human visual system, helps us think more effectively, and exposes underlying data as we’re building cloud-native apps.”
Open source is just getting started
Google’s Brewer, who, among his other notable achievements was involved in open-sourcing Kubernetes — “We wanted it to be a platform for everyone and something everyone could contribute to,” he said — believes open source will continue to take on a larger-than-life role as we move further into the future, as it enables companies to move faster than ever before.
He’s now thinking of how to extend this to automation frameworks and how open source is used in supply chains. “Most of your code isn’t written by your team or even your company,” Brewer said. This presents a new slate of challenges, though, particularly when it comes to versioning and vulnerability tracking. In these endeavors he’s highlighting another intersection of open-source development and cloud-native automation.
Meckfessel also believes that the future of software is incredibly collaborative.
“We want the exploration of data for devs and creators to be as real-time and interactive as possible,” she said. Part of that requires “bringing together open-source communities so you don’t have to start from a blank slate and can immediately get up and running.” Meckfessel envisions a future where everyone can collaborate and share what they know. “We don’t write software alone.”
From my perspective, this is the power of open source and I imagine a future where data pipelines and visualization frameworks are fully open source, and the value of the platform is derived from the data you bring and the wisdom achieved.
In this world, to fork a data pipeline means to adapt it to your data. You find the example and then you start to tinker, swap out the data, change the visualization, and get to insights quickly. You’re going to get to the outcome much faster, and can contribute your fork to the repository, because it was the data that had the value, moreso than the pipeline.
Brewer sees things the same way.
“When you say the word fork, it implies you’re making a copy. You have your own copy, which means you get velocity, autonomy, and independence,” he said. “The only way you can do large-scale collaboration is through this copy model.”
If cloud-native is all about speed and agility, you pretty much need to leverage the power of open source if you want to build the best data-driven apps you can.
It would be hard to imagine being cloud-native without being open source. This represents a big change from the classic data management concept of one monolithic database (or data store) containing all the data with tight access controls. In the future, we may see more databases (or data stores) containing less data. All of this is intended to make data available as real-time as possible. I believe all of this will help developers increase velocity in terms of time and quality of their output — and that, of course, is a very good thing.
If you want to hear these industry leaders continue their discussion with a focus on the future of data and cloud-native and open source technologies, the entire discussion is available here.