Companies have been ramping up their research on convolutional neural networks or “deep learning,” which is a method of training computers to recognize images and speech. It’s an advancement in technology that, when developed correctly, makes computers almost as capable as humans at the task.
Microsoft is one of these companies, and it recently announced advances in its technology after winning a computer vision challenge. Last Thursday, Microsoft took first place in several categories in the ImageNet and Microsoft Common Objects in Context Challenges, which brought corporate and research institutes together to look at approaches to systems recognizing images, like objects, faces, and emotions in both photographs and videos.
(Related: How deep learning is making an impact on AI)
Jian Sun, a principal research manager at Microsoft Research who led the project, said the company’s system outperformed the competitors and won all main tasks (five in total) by a single type of neural network and a single system, passing others on detection by a “large margin.”
Sun said that deep neural networks have been used for years in things like speech translation in Skype Translator, as one example.
Microsoft said its system is more effective than existing neural network systems because it enabled researchers to build deep neural nets, which were 5x deeper than its previous systems. According to a Microsoft blog post, the researchers trained the deep neural network system three years ago, enabling it to have eight layers—each containing small neuron collections that look at portions of the input image. Last year, it delivered results from 20 to 30 layers. With their newest system, it had 152 layers, which is about 5x more than any previous Microsoft system, according to Sun.
“Regarding future research, what we learned from our extremely deep networks is so powerful and generic that we think it can substantially improve many other vision tasks,” said Sun. “Because our system is a major expansion in the number of layers successfully used in a deep neural network, it’s an important development for the field, broadly.”
In practice, Sun said that using more layers resulted in signals vanishing as they passed through each layer, which led to difficulties training the whole system.