Microsoft wants to improve global collaboration among the data research community with the release of Microsoft Research Open Data, a data repository stored in the cloud.
The repository will provide a single platform for Microsoft researchers and collaborators to share datasets, technologies, and tools. It will also have an option to copy datasets directly to an Azure based Data Science virtual machine.
“With data growing at an exponential rate, perceived to be over 150 ZB of data available by 2025, it is now recognized that we need to prioritize bringing processing to data versus relying on data movement through Internet bandwidth that is growing at a much slower pace. We believe that there is real utility in providing an option to bring the processing to the data,” Vani Mandava, director of data science outreach at Microsoft, wrote in a post.
In addition, the repository will come preloaded with popular development tools for researchers and practitioners.
According to Microsoft, “the repository meets the repository meets the highest standards for data sharing to ensure that datasets are findable, accessible, interoperable and reusable.” The company also said that the site will continually evolve as more feedback is gathered from users.
It hopes that this repository will augment the existing data repositories already used by researchers. Categories include: astronomy, biology, chemistry, computer science earth science, education, engineering, healthcare, information science, and mathematics.
“I am often asked to share my research data and the public sharing I have done in the past has been popular. Coordinating and cataloging these datasets in one place with Azure will be helpful for both internal and external researchers, giving them easy access, encouraging collaboration, and providing convenient cloud-based access to the wealth of Microsoft Research shared data,” said John Krumm, principal researcher for Microsoft Research AI.