The spotlight turned to data analysis and real-world business use cases at Big Data TechCon, which winds down today in San Francisco. Talk was of day-to-day work and decision processes needed to support large-scale Big Data operations, with topics such as Hadoop, Spark, NoSQL and traditional relational databases, and how they can be leveraged for data-based products.
Gloria Lau, vice president at data at Timeful, discussed a method she uses to decide where her teams put their time. Lau was formerly the manager of LinkedIn’s data science team, and it was there she perfected this method.
Her method begins with a classic quote from Donald Knuth: ”Premature optimization is the root of all evil.” To this end, she admonished attendees to ask themselves two questions when deciding what their priorities should be when building data products.
These questions are: “What is the metric this product is trying to lift?” and “If your users gave you only one minute a day using the product, what would you want them to be doing?” She then applied these questions to a number of scenarios that could confront Big Data developers.
For example, if the tracking of a metric is not consistent, how do you decide if your team should track down this bug? According to Lau’s method, it is only important to fix such a bug if it is affecting the key metric you’ve identified.
Elsewhere at the show, Dean Wampler, consultant for TypeSafe, described how to process streams with Apache Spark. He said that using resilient distributed data sets in Spark allows developers to reuse large chunks of code. He said it’s the setup and teardown code that is different, rather than the core code, which can be reused on HDFS or any other data store, like Apache Cassandra.
Todd Cioffi, director of RapidMiner University, discussed the shortcomings of business intelligence when it comes to predictive analytics. As an example, he described how business intelligence analytics can tell you, perhaps, that one in five customers will not buy your product a second time. But that information doesn’t help you figure out which actual customer will leave.
“What you want to be able to do is take a look at an individual level and figure out the likelihood of those individuals leaving. Business intelligence tends to take a look at an aggregated overview and less at the individual person in a predictive fashion. Business intelligence answers only what it’s asked. So if you’re doing exploratory analysis, your exploration is based on what you can think of to ask. It’s tough to ask SQL ‘show me what’s in there.’ Still it’s dependent upon your analysis, first, to get you the answers, rather than letting data tell you what’s there,” said Cioffi.
“Where business intelligence leaves off, predictive analytics picks up. It’s not that you shouldn’t be doing business intelligence, it’s that you shouldn’t be doing just business intelligence if you want your data to be smarter for you,” said Cioffi.
Lau’s keynote is available here.