Businesses rely on data to make decisions that drive their bottom line. But if they can’t trust the data, or the analysis of the data, they lose the ability to move with more certainty that what they’re doing is correct.
Data quality has many different inputs and dimensions. IDC research director Steward Bond said that among them are data accuracy, duplication, consistency, correctness and context. And the level of data quality that is available within an organization is going to change. Further, working with internal data is different than working with external data you receive as inputs. “So,” Bond said, “I don’t if there’s a really good answer” to the breadth and depth of the data quality problem.
He went on to say that if even the data quality level is good, many of the data analytics tools inherently have some sort of human bias, because they’re going to be skewed by what the data teams in an organization want to get out of the data or expect to get out of the data.
“We’ve heard stories about two people showing up at an executive meeting with two different results coming in from supposedly the same set of data,” Bond said. “That can really erode the trust that is in the data, and the analytics of the data.”
Storing data in the cloud also presents challenges when it comes to trust, Bond explained, because every SaaS application has to have its own copy of the people, places and things the organization most cares about. “I liken this back to the game some people call the telephone game. It’s when a group of people sit in a circle, or they’re standing in line, the first person whispers a phrase to the person next to them. When you get to the end, what happens is the story changes, or the phrase changes, and so you have that same potential issue with every single copy of data that’s created. And so that comes into that data quality calculation and estimation as well.
At the beginning of the SD Times Data Quality Project, I described issues of data quality as being the industry’s “dirty little secret.” But organizations such as IDC have been able to pull the curtain back on this, and a recent survey of 300 people who do data analytics with business intelligence and dashboarding tools showed that only 10% of the respondents said the quality of their data — or their trust in that data — was not a challenge at all, Bond said. “This means that 90% have some level of concern in trusting the quality of their data.”
While Bond said he doesn’t think the industry will have pure, pristine, 100% clean data, he did say if organizations know the level of their data quality — the data quality score — they can take that score and bring it into their algorithms as a form of a statistical level of competence. Those who have, he noted, have found “a tremendous improvement in the success of how they’re analyzing that data. So then that gives some guidance as to how and where you can use the results of those analytics in your decision-making.”