Organizations today get that they have to collect data to stay competitive. They understand how to store it, retrieve it and slice it.
The idea now is to understand the data itself, to detect patterns and trends that will help the organization get new customers or members, service them more personally and engage with them more deeply.
“The idea of using data is ubiquitous,” Mike Olson, co-founder of Hadoop platform company Cloudera, said during a panel at the recent Strata + Hadoop World conference in New York City. “Now, we’re starting to see applications that leverage data. We’re starting to see what data enables beyond the technology. It’s less about the technology stack and about how we decide better, as people and as organizations. All this is in the service of making better decisions.”
Alistair Croll, the chairman of the conference produced by O’Reilly Media, added that Big Data today “is a double-edged sword. We need the belief that we will use data for good and not for bad—but be diligent about the bad and get the word out about it.”
Trust plays an important role in the acceptance and use of data, according to Tim O’Reilly, founder of O’Reilly Media. “We have to reframe the dialogue away from privacy and into trust. If Google tells me to leave 15 minutes earlier because of traffic, I trust it,” he said. “But what do we do when people break our trust?”
The collection and analysis of data has led to tradeoffs for a long time. For instance, life insurance companies charge higher rates for smokers, and auto insurance companies charge higher rates for young, male drivers.
Yet Croll cautioned about Big Data from a societal standpoint. “There needs to be a renegotiation. By getting so much data on individuals and putting them in buckets of risk, for example, leads us away from the idea of all of us being in this together. And individuals will be less likely to be placed with others at a higher risk to amortize that risk.”
* * *
In a move not lacking in irony, Continuuity has changed its name, to Cask. At the same time, about a month ago, it has also open-sourced its data application platform-as-a-service. I sat down with founder and CEO Jonathan Gray at Strata + Hadoop World to talk about the platform, squarely aimed at developers building data-intensive applications. He described Hadoop itself as the next-generation data management platform, but pointed out, “In the end, it’s all about apps.” Large companies, he said, are becoming large data companies, trying to monetize their data. “Data will become their lifeblood,” he said. “They’ll be building apps themselves, but it’s a challenge if you’re a bank, not Google or Facebook.”
Cask’s CDAP platform is positioned as a virtualization layer on top of Hadoop, with standard containers for apps, data access, the ability to run unit tests for Hadoop applications, and then deploy them into production, Gray said.
“People have focused on the infrastructure layer, and developers have been left out,” he said. The goal of CDPA, he added, is “to bring developers as far down the path to solving their problems as we can.”
One of the special capabilities of Cask is that it can perform mixed batch and real-time processing. “This is the future,” he said, “for things from recommendation systems to anomaly detection. It’s extraordinarily difficult.”
Gray said companies like Mattel, which makes toys, should not be building their own data applications. “It’s not their business,” he said. Instead, there are people building products and services on top of Hadoop that they sell back to companies like Mattel.
“They’re in the revenue-generating path, not the cost center,” Gray explalined. “We want to be in that path.”
* * *
Cloudera chief technologist Eli Collins noted that people need to learn to use data to solve problems. With that, though, come issues of privacy, governance and ethical handling of data. “You need strong data management to gain the benefits [of Big Data] and solve data problems.
He cited the example of Disney’s “Magic Bands,” which visitors to the theme parks can use to reserve ride times, enter their hotel rooms, and change plans on the fly.
Imagine, bypassing the long wait lines to go on the “It’s a Small World” ride. That’s because in reality, it’s a Big Data world