NVIDIA announced its application framework for building conversational AI services is now available. The new NVIDIA Jarvis framework comes with pre-trained deep learning models and software tools to help developers create conversational AI services that can be easily deployed from the cloud or at the edge.
According to the company, it offers automatic speech recognition and language understanding, real-time translations for multiple languages and new text-to-speech capabilities to create expressive conversational AI agents.
The new offering was trained over several million GPU hours on over 1 billion pages of text, 60,000 hours of speech data, and in different languages, accents, environments and lingos to achieve world-class accuracy, NVIDIA stated in a post.
“Conversational AI is in many ways the ultimate AI,” said Jensen Huang, founder and CEO of NVIDIA. “Deep learning breakthroughs in speech recognition, language understanding and speech synthesis have enabled engaging cloud services. NVIDIA Jarvis brings this state-of-the-art conversational AI out of the cloud for customers to host AI services anywhere.”
First, developers can choose pre-trained Jarvis models from the NVIDIA NGC catalog and then fine-tune it with the NVIDIA Transfer Learning Toolkit. Models can also be deployed using just a few lines of code so deep AI expertise isn’t needed.
NVIDIA also partnered with Mozilla Common Voice to, an open-source collection of voice data, to train voice-enabled apps, services and devices.
“We launched Common Voice to teach machines how real people speak in their unique languages, accents and speech patterns,” said Mark Surman, executive director at Mozilla. “NVIDIA and Mozilla have a common vision of democratizing voice technology — and ensuring that it reflects the rich diversity of people and voices that make up the internet.”