Yahoo has released CaffeOnSpark, which brings the fruits of two University of California, Berkeley projects together: vision-focused deep learning framework Caffe, and Big Data processing engine Apache Spark.
Without the aid of Spark, Caffe can process up to 60 million images per day. Those numbers come from benchmarks on a single NVIDIA GPU, so the move onto a Hadoop cluster should maximize the processing speeds, which already top 1 millisecond per image for inference, and 4 milliseconds per image when learning.
(Related: Is Spark replacing Hadoop?)
The Caffe framework was originally created at the UC Berkeley Vision and Learning Center. The school offers a Web-based demo into which users can plug in an image URL, and the Caffe application on the back end will offer up categories it expects the picture to fit into.
CaffeOnSpark was created by the machine learning team at Yahoo, where it was used for image recognition by the Flickr team. The machine learning team advocated the use of deep learning algorithms on the same Hadoop cluster where the primary business data is located.
CaffeOnSpark is available on GitHub under the Apache 2.0 license.