Caltech is using mixed reality to improve the lives of blind people. New research combines augmented reality with computer vision algorithms that will allow developers to build software that will enable objects to “talk.”

The headset, CARA (Cognitive Augmented Reality Assistant), is designed to translate the visual world into English audio. The researchers believe that the device can be used in banks, stores, museums, and other locations to make those spaces more accessible for the blind.

“Imagine you are in a world where all the objects around you have voices and can speak to you,” said Markus Meister, one of the authors of the paper and faculty member at the Tianqiao and Chrissy Chen Institute for Neuroscience at Caltech. “Wherever you point your gaze, the different objects you focus on are activated and speak their names to you. Could you imagine getting around in such a world, performing some of the many tasks that we normally use our visual system for? That is what we have done here—given voices to objects.”

CARA was created by a team of scientists led by Yang Liu, a graduate student at Caltech. It was developed for use on Microsoft’s HoloLens headset.

The technology utilizes spatial sound, which makes objects sound different depending on where they are located in a room. For example, if an object is to the left of the user, its voice will come from that direction. The closer the object is to the user, the higher the pitch of its voice will be.

In order to avoid multiple things speaking over each other, CARA was programmed with three different modes:

  • In spotlight mode, the object will only say its name when the user is facing it.
  • In scan mode, the room is scanned from left to right and the objects will say their names accordingly.
  • In target mode, the user can select an object to talk to exclusively and have it act as a guide for navigation.

In addition to CARA, the team also developed a new test for evaluating the performance of assistive devices for the blind. This test provides a benchmark that researchers can use without having to reconstruct physical spaces.