Google’s machine learning, speech synthesis technology is now officially available. Google Cloud Text-to-Speech was originally published by DeepMind about two years ago and updated about a year ago with WaveNet research, and announced by Google in March.
The technology allows developers to “synthesize natural-sounding speech with 30 voices, available in multiple languages and variants,” the company explained on its website. Since revealing the solution in March, Google has adding another 17 new WaveNet voices. In total, Cloud Text-to-Speech supports 14 languages and variants, 56 voices with 30 standard voices and 26 WaveNet voices, according to Google.
Cloud Text-to-Speech is also being released with the beta version of Audio Profiles, which enables playback on a variety of hardware. “You can now specify whether audio is intended to be played over phone lines, headphones, or speakers, and we’ll optimize the audio for playback. For example, if the audio your application produces is listened to primarily on headphones, you can create synthetic speech from Cloud Text-to-Speech API that is optimized specifically for headphones,” Dan Aharon, product manager of Dialogflow for Google, wrote in a post.
In addition, the company made several improvements to its Cloud Speech-to-Text technology. Speech-to-Text provides speech recognition capabilities for applications and solutions. Updates includes multi-channel recognition for interpreting multiple voices, speaker diarization, language auto-detect, and word-level confidence scores.
“We hope these added features make it easier to create smarter speech-enabled applications, and we’re excited to see what you build,” Aharon wrote.