Researchers from the MIT-IBM Watson AI Lab want to make computers more “human.” The researchers are currently working on a project that will help computers understand and recognize dynamic events.
“As we grow up, we look around, we see people and objects moving, we hear sounds that people and object make. We have a lot of visual and auditory experiences. An AI system needs to learn the same way and be fed with videos and dynamic information,” said Aude Oliva, principal research scientist at MIT CSAIL, and the project’s principal investigators.
The research, Moments in Time dataset, is part of the the MIT-IBM Watson AI Lab’s commitment to artificial intelligence. The lab is focused on four research areas: AI algorithms, the physics of API, the application of AI to industries and advancing shared prosperity through AI. Moments in Time dataset is focused on building a large-scale dataset that will enable AI to recognize and understand actions and events in videos. Currently, the dataset consists of one million labeled 3 second videos that include people, animals and objects.
“This dataset can serve as a new challenge to develop AI models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis,” Oliva added, describing the factors involved. Events can include people, objects, animals, and nature. They may be symmetrical in time — for example, opening means closing in reverse order. And they can be transient or sustained.
According to the researchers, this dataset took more than a year to put together and faced many technical issues such as choosing the action categories, where to find videos, and how to put it together in a way that an AI system can learn without bias.
“IBM came to MIT with an interest in developing new ideas for an artificial intelligence system based on vision. I proposed a project where we build data sets to feed the model about the world. It had not been done before at this level. It was a novel undertaking. Now we have reached the milestone of 1 million videos for visual AI training, and people can go to our website, download the dataset and our deep-learning computer models, which have been taught to recognize actions,” Oliva said.
Going forward, the researchers expect the dataset to improve and provide more capabilities to AI systems with a variety of abstraction levels. The current Moments in TIme dataset is just the first version, and already one of the largest human-annotated video datasets, according to the researchers. This will server as a stepping stone towards developing learning algorithms that “can build analogies between things, imagine and synthesize novel events, and interpret scenarios,” according to MIT.
“We expect the Moments in Time dataset to enable models to richly understand actions and dynamics in videos” said Dan Gutfreund, a principal investigator at the MIT-IBM Watson AI Laboratory .