If you’re working on a data project and find yourself stumped, call on legions of professionals to solve your problem, even as they compete for cash and professional exposure along the way.
This is the philosophy of San Francisco-based Kaggle, and with more than US$11 million in funding and a network of more than 94,000 assorted data scientists worldwide, it seems to be working. The premise is simple: Organizations present problems to the Kaggle community, outline the rules that must be followed, and back it with a cash prize ranging from several hundred to $10,000 to $250,000 for commercial competitions. Then there’s the ongoing big jackpot: a $3 million Heritage Health Prize to create an algorithm to help people avoid unnecessary hospitalization.
Once a competition has been created, the problem is crowdsourced with users of all types (scientists, engineers, developers, marketers or mathematicians) divvying up tasks and submitting a working model as a potential solution. Along with the cash and a cool new tag for their resumes, the winning team also generally gets a chance to speak with the engineering and/or management ends of the company sponsoring the competition. And the sponsor usually enjoys the fresh perspective brought to them by the winning team.
After a competition is finished, the winning team is tasked with writing up a performance report, and the competition’s sponsor chooses whether to make the winning solution’s code exclusive, open-source, or shared in any way.
In some of the site’s more unusual competitions, data-analysis teams are working on the best model for identifying bird species from continuous audio recordings, while others are working on a means of keeping whales from colliding with transatlantic ships. More real-world problems are reflected in the $250,000 competition for an algorithm to help better predict airline flight delays.
The idea of gathering talented minds to take on complex, Big Data problems is as old as the notion of the think tank, but Kaggle seems to have taken a different, almost social-networking approach to it. Kaggle was founded by Anthony Goldbloom, a University of Melbourne graduate with a degree in economics and econometrics. He spent time in the economic modeling unit for Australia’s Department of the Treasury before working for the Reserve Bank of Australia. During an internship with “The Economist” magazine, he found that a number of large-company CIOs stated that while access to data was undeniably critical, the challenge was in finding talent to produce and work with the numbers to find the best solutions for large-scale problems.
Thus, the effort to make it easy for companies with Big Data problems and an army of data analysts to readily find each other was born.
Where some might question why some of the world’s best minds would take their attention away from their current assignments and work on extracurricular projects, president and chief scientist Jeremy Howard—who also was the first participant to win multiple competitions—was quick to point out that the competitions represented a great means of developing new skills and sharpening old ones.
“Kaggle is a great place to develop skills by using them, as opposed to hearing about them in an educational setting,” he said. “We think people come to learn, to have fun, to develop a professional reputation, and then the prize money.”
Still, it may be the competitive element that seals the deal for many participants. Tapping into a social network ideal, Kaggle allows users to readily form teams to attack problems together. This is enhanced by a real-time, continuously updated leaderboard that tracks and populates each competition’s standings. The Kaggle website also shows which teams and individuals are currently leading the pack and by how much. The final score is tallied by Kaggle’s proprietary metrics, which account for the level of accuracy between the problem and the solutions submitted. “There are hundreds of proxies for accuracy, and we are heavily involved in choosing one that fits the problem well,” said Howard.
According to Howard, a large number of the solutions that wind up winning the competitions become open-sourced and used later on for different projects. For marquee-level Kaggle competitions, the companies sponsoring the competitions typically express an interest in owning the intellectual property developed over the course of the contest.
One such property came from a competition to observe dark matter in the universe. Winning solutions were converted to open-source code, including one from a programmer named Vishal Goklani written for an IPython notebook.
There’s an elite level to everything, and Kaggle is no different with its Connect program. With Connect, individuals and teams that have won multiple contests and are considered elite, giving them access to private, higher-end contests with prizes that have been described by Howard as “quite lucrative.”