You can’t deal with Big Data that is also real-time data without feeling overwhelmed. It’s as if money were falling from the sky at an ever-faster rate, and you were scrambling to capture it all without worrying about which bills might be counterfeit. You want to be able to sort out what’s important later rather than risk missing something that will turn out to be valuable.
This will be a common problem as the Internet of Things (IoT) takes root in every industry, putting connected data-gathering gadgets everywhere. Agriculture, for example, is seeing the advent of cheap sensors you can plug into a tomato or a pepper for continuous monitoring of its moisture content and other indicators of health. I know a major fruit producer that has gotten very precise about measuring every aspect of production and distribution to drive up yields and lower costs. In the oil industry, where I used to work, companies are remaining competitive (despite low petroleum prices) by using IoT in their production facilities and wells. By monitoring minute-by-minute for pressure, temperature and dozens of other indicators, operators can look at the patterns leading up to a well failure for clues to prevent the next one.
(Related: Different ways to bolster IoT connectivity)
At RingCentral, where I lead the analytics team, we have gone from sampling the log data coming off routers and firewalls to capturing the whole stream. For us, IoT means reaching beyond the core network to capture data off individual end user phones—another major data source.
The ability to capture a complete picture of whatever we are managing, where previously we could only handle a fraction of it, is a large part of what the excitement over Big Data is all about. When I was working on advanced analytics for Saudi Aramco in the 1990s, you needed a supercomputer to approach this level of sophistication. Today, the technology required to tackle extreme data processing challenges is almost commoditized with open-source software like Hadoop. Sometimes it still makes sense to spend the money for commercial data processing technologies, but they’ve gotten much more powerful too.
There are industries where sampling remains necessary. Online games generate so much data (every player’s choice of weapon or interaction with another player) that their producers throw some of it away. At the same time, many are plagued by fraud, making it important that they monitor a player’s every move to spot patterns associated with theft of virtual goods or improperly earned statuses within the game. They make the best tradeoffs that they can to ensure they are capturing the important stuff.
Once we turn to finding useful information within the data, that process is typically reductive—filtering from the overwhelming mass of data to the nuggets that mean something. When we create a dashboard for customers, or even for executives within our own company, we do not want to present all the data. Far from it, we want to show a few simple graphs of clearly understandable metrics.
You might think we could save ourselves some trouble by measuring only those things in the first place. On the contrary, we want to make the top of the funnel as broad as possible so what we distill at the bottom is the best it can possibly be.
Bigger is better
This is where doubling your data collection can help. Why double up on something that’s already overwhelming? Because quality is even more important than quantity. I cited online gaming as an industry where data is particularly overwhelming, and yet gaming companies typically run redundant data pipelines in parallel so they can check them against each other and have them back each other up. By “pipeline,” I mean the whole series of steps for capturing, transmitting, transforming, organizing and storing data.
RingCentral is implementing the same approach. We are building redundancy into our data streaming pipelines so that we have complete collection of all our systems and can ensure that we provide a high-quality data pipeline that our customers can depend on. On top of this global highly resilient network of streaming pipelines, we are enriching these streams using traditional batch mode streams with other systems. It’s important for us to have real-time information for operational purposes so we can correct for outages and other network issues as quickly as possible. If we’re doing everything right, the batch and real-time measures should agree on measures of overall network performance. Having a method of validating the data you are analyzing is important.
Early in my tenure, we discovered the need to improve the monitoring of our core network. The reason we knew our existing monitoring was not as good as it ought to be was we had another method for measuring call quality. Most VoIP phones support a standard called RTP Control Protocol Extended Reports (RTCP XR), which means they give their own independent report on call quality after the call ends. Comparing the RTCP XR data with our network monitoring dashboards allowed us to see where the gaps were. After our engineers devised a more complete network data gathering system, the data from the phones let us validate that we had fixed the problem.
Why didn’t we just switch to using the RTCP XR data from the phones as the authoritative source? Partly because we get different, operationally valuable information from our core network monitoring, but also because having two ways of measuring the same thing is better than one.
Just like in mining, when we collect everything, we have a better chance of finding gold. For example, we discovered that many customers who had upgraded their networks over the years were hobbling themselves by using CODECs optimized for very low-bandwidth networks. CODECs are the basic encoding/decoding software for transmitting voice as data, so using the wrong one is like venturing out on the Autobahn with your Mercedes stuck in first gear. Imagine the frustration of customers who have invested in bandwidth upgrades, at least partly to improve the quality of their Internet phone service, only to see little or no improvement. Yet once the problem was identified, fixing it was as simple as directing these customers to reconfigure a setting on their phones.
We try to collect everything, even doubling up where necessary, because we don’t always know what is important until we find it.