In 2012, researchers Danah Boyd and Kate Crawford published the paper “Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon.” Among their warnings are the following:
1.“Big Data changes the definition of knowledge.”
2.“Claims to objectivity and accuracy are misleading.” Subjective interpretation and “storytelling” as well as data loss and noise, will always affect results.
3.“Bigger data are not always better data.” The more data
there is, the greater the need for quality sources and statistical rigor.
4.“Taken out of context, Big Data loses its meaning.”
5.“Just because it is accessible does not make it ethical.” Privacy protection is also imperfect, as anonymized records can be reconstituted.
6.“Limited access to Big Data creates new digital divides.”
But wait, there’s much more where that came from. What about asking the wrong questions? Data scientists may be born quants, but they’re unfamiliar with the business domain—which is more important than knowing Hadoop. And what about asking questions companies don’t really want to know the answers to, such as Twitter sentiment analysis that might require compliance reporting of side effects to the FDA?
Speaking of governance, that, along with privacy, metadata management and security, presents a real risk to enterprise big data efforts. Privacy regulations to be cognizant of include:
• The USA Patriot Act’s KYC (Know Your Customer) provision
• The Gramm-Leach-Bliley Act for protecting consumers’ personal financial information
• The UK Financial Services Act
• DNC compliance
• Consumer Proprietary Network Information (CPNI)
• The European Union Data Protection Directive
• The Japanese Protection for Personal Information Act
• The Federal Trade Commission, 16 CFR Part 314—safeguarding customer information
And then there are the usual problems associated with any IT effort: skill, scope and context.
A recent survey conducted by Infochimps found that “Lack of Business Context Around the Data” (51 percent), “Lack of Expertise to Connect the Dots” (51 percent), and “Inaccurate Scope” (58 percent) were the top reasons big data projects fail.