Is Big Data good? Or is it evil?
On one hand, by providing new insights into their customers, and by allowing amalgamation of information from disparate sources, Big Data is enabling tremendous efficiencies at many companies. That’s great! Let’s keep investing.
On the other hand, a few weeks ago, I attended a workshop on cybercrime put on by the U.S. Federal Bureau of Investigation and the U.S. Secret Service. Special agents from those organizations pointed to the use of Big Data technologies by criminals to assemble bits of information from victims—credit card numbers, e-mail addresses, user names, physical addresses, and so on—to enable identity theft on a huge scale. That’s scary! Let’s hide under the bed covers.
Truth be told, Big Data is a collection of enabling technologies that are here to stay. Good or evil, it doesn’t matter. The genie ain’t going back into the bottle.
That’s why IT professionals and developers need a thorough understanding of what Big Data is beyond being a mass-media buzzword. Here are five phrases that you need to know about Big Data:
1. Map/Reduce is king of the Big Data heap. Map/Reduce breaks down huge calculations and data sets so they can be processed quickly and in parallel. Map breaks up a large problem into small pieces and distributes them to the nodes in a cluster. Reduce collects the answers and combines them to produce the output. Map/Reduce is brilliant, simple, and the key to many—if not most—Big Data solutions.
2. Hadoop drives Map/Reduce. Hadoop is an open-source project that implements Map/Reduce on inexpensive hardware, like clusters of off-the-shelf x86 servers. Hadoop consists of a file system, a resource-management platform, and an implementation of Map/Reduce.
3. NoSQL feeds Hadoop. NoSQL databases go beyond the rows and columns found in relational databases. Leading NoSQL implementations, such as Couchbase, CouchDB, MongoDB and Neo4j, are fast, efficient, incredibly scalable, and interface very easily with Hadoop and other Big Data frameworks.
4. Analytics and visualization are vital. It’s not enough to amalgamate the data. It’s not enough to merge disparate data sources together. Big Data problems are solved through analytics. A good analytics package is essential, and so is having data scientists capable of creating and interpreting those analytics. This calls for strong visualizations so that everyone can understand the results.
5. Big Data is a career path game-changer. If I were advising a young IT professional just starting a career, I’d suggest moving into Big Data. Master Hadoop and the related Apache projects. Focus on data. Learn the analytics. Dive deep into NoSQL. Follow the startups. Be involved in the Big Data culture. Hey, even if you’re not new in your career, I am giving you that advice too. The game is changing. Change with it.
Let me point you to two powerful resources:
• Subscribe to Big Data Tech Report, produced by BZ Media—the company behind SD Times and News on Monday.
• Attend Big Data TechCon, also by BZ Media. It’s the best place to learn about Big Data, and the best place to network. The next Big Data TechCon is March 31–April 2 in Boston.
Finally, a bit of industry news from Hazelcast, maker of an in-memory data-grid system that can accelerate Map/Reduce performance significantly. Hazelcast has released a Map/Reduce API that the company says can process data while it’s streaming in from a transactional system, saving steps and also allowing for real-time scrutiny of the data for abnormalities. Neat, eh?
Big Data is not evil. It’s a toolbox that we can and should use. What do you think of Big Data for your company or your career? Write me at email@example.com.
Alan Zeichick, founding editor of SD Times, is principal analyst of Camden Associates.