The Big Data story, I was told at Strata Conference + Hadoop World in New York City in late October, is not about data at all. It’s about people understanding data to change and improve their lives.
This was the message of Digital Reasoning’s Tim Estes, who used the model of the Agile Manifesto to emphasize (like Agile’s “people before process”) that big understanding is greater than Big Data.
It’s a thought shared by Sharmila Shahani-Mulligan of ClearStory Data. Her point about Big Data was that in a world of vast farms of data storage, cheaper memory, blazing compute speeds, advanced algorithms, and machine learning, “the most valuable thing to be gained from Big Data is human insight.”
Shahani-Mulligan cited some statistics that showed that 37% of analysts still rely on gut feelings to make decisions; that 44% have no insight into how decisions are made on a corporate level; and that 52% of managers say they need new training.
The last decade was about building out farms of relational databases for structured data, using what are now considered constrained data models. The next five years, she said, will require the integration and use of any data from anywhere, the notion of intuitive exploration of data, and the ability to make sense of the data at scale.
The amount of data companies store on their own is already growing at a very fast rate, but even faster is the amount of public data being made available. Mulligan said that in 2006, there were fewer than 100 open data APIs; today there are more than 7,000. “You have Twitter, Facebook, Netflix and Google Public Data,” she said, naming but a few. “Across these sources is a wealth of data anyone can access. Combine that with private data and managers can drive insights they couldn’t before.”
Shahani-Mulligan used the example of a water company taking public data regarding the location of water lines and the households near those lines, and combining it with private data about the makeup of the families living in those homes, to best gain insight into how to better serve households that are customers, or to serve households they hadn’t served before.
“When you fuse data from private sources with clickstream data, or online sales, or product SKUs, tremendous insights can be gained,” said Shahani-Mulligan during her keynote talk. “Better visualizations are only one part of the answer. We will need tools that show what the data means.”