Kaldi, an open-source speech recognition toolkit, has been updated with integration with the open-source TensorFlow deep learning library.
Developers Yishay Carmiel and Hainan Xu of Seattle-based IntelligentWire are behind the integration, and their plan is to use the combination to accelerate the advancement of automatic speech recognition (ASR) systems.
IntelligentWire specializes in cloud software that helps businesses gather analytics from live phone conversations between representatives and customers and automatically handles data entry and responding to requests. The corporation currently focuses on the contact center market, which amasses over 50 billion hours in phone calls and 25 billion hours in business application use across 22 million agents worldwide each year, according to the post.
“For an ASR system to be useful in this context, it must not only deliver an accurate transcription but do so with very low latency in a way that can be scaled to support many thousands of concurrent conversations efficiently,” Carmiel wrote in a post on the Google Developers Blog along with Staff Research Engineer at Google, Raziel Alvarez.
“For IntelligentWire, the integration of TensorFlow into Kaldi has reduced the ASR development cycle by an order of magnitude,” the post reads. “If a language model already exists in TensorFlow, then going from model to proof of concept can take days rather than weeks; for new models, the development time can be reduced from months to weeks.”
The primary issues they need to overcome in ASR are all things the developers think will be much more quickly handled with deep learning models: algorithms that are adaptable and expandable, knowing what data is valuable in the context of multiple languages and acoustic environments, and the pure computational power required to parse raw audio into something usable.
Carmiel and Xu hope that by bringing together two “vibrant” and active open-source user-bases, speech-based products and research will see an abundance of breakthroughs.