OpenAI is introducing new features for voice and image interactions. These features provide a more intuitive way to communicate with ChatGPT, allowing users to engage in voice conversations and share visual content to enhance the conversation.
The addition of voice and image features in ChatGPT offers users more versatile ways to use the tool in their daily lives. According to OpenAI, a few examples of how users can make use of these new features is they can take pictures of landmarks while traveling for live discussions, snap images of their kitchen to plan meals and even get recipe guidance, and help their children with math problems using photos and shared hints.
These features will be gradually rolled out to Plus and Enterprise users over the next two weeks, with voice available on iOS and Android (opt-in required in settings) and image support on all platforms.
To begin using the new voice feature in ChatGPT, users can navigate to the mobile app’s Settings section, access New Features, and opt-in for voice conversations. Then, on the home screen, they will discover a headphone icon located in the top-right corner.
By clicking this icon, users can choose their preferred voice from a selection of five options. This voice functionality relies on a text-to-speech model capable of generating human-like audio from text and a short sample of speech. The voices were developed in collaboration with professional voice actors, and speech recognition is performed using the open-source Whisper system to convert spoken words into text.
Additional details are available here.