NVIDIA announced an array of deep-learning focused updates to its cloud computing software and hardware initiatives today during the Computer Vision and Pattern Recognition Conference (CVPR) in Salt Lake City. The announcements included Apex, an open-source deep-learning extension for the PyTorch library; NVIDIA DALI and NVIDIA nvJPEG, GPU-accelerated libraries for data optimization and image decoding; the release candidate for Kubernetes on NVIDIA GPUs; and version 4 of the company’s inference optimizer and runtime engine TensorRT.


During the event, NVIDIA demoed an early release version of Apex, which the company says is inspired by state-of-the-art pattern recognition techniques like mixed precision training in translational networks, sentiment analysis and image classification, all features they’ve incorporated in Apex to speed up training of deep learning models on NVIDIA Volta GPUs while aiming for similar stability to single precision training models.

“Specifically, Apex offers automatic execution of operations in either FP16 or FP32, automatic handling of master parameter conversion, and automatic loss scaling, all available with 4 or fewer line changes to the existing code,” the company wrote in the announcement.

While still under active development, the early release version of Apex is available for download via GitHub, with NVIDIA hoping to improve the extension based on community feedback.


Taking aim at performance bottlenecks associated with image recognition and decoding in deep-learning powered computer vision applications, NVIDIA is leveraging the power of its GPUs with NVIDIA DALI, which utilizes the new NVIDIA nvJPEG library to decode images at greater speed.

“With DALI, deep learning researchers can scale training performance on image classification models such as ResNet-50 with MXNet, TensorFlow , and PyTorch across Amazon Web Services P3 8 GPU instances or DGX-1 systems with Volta GPUs,” the company wrote in its announcement. “Framework users will have lesser code duplication due to consistent high-performance data loading and augmentation across frameworks.”

NVIDIA nvJPEG “supports decoding of single and batched images, color space conversion, multiple phase decoding, and hybrid decoding using both CPU and GPU,” the company wrote. “Applications that rely on nvJPEG for decoding deliver higher throughput and lower latency JPEG decode compared CPU-only decoding.”

Kubernetes on NVIDIA GPUs

The CVPR conference also saw NVIDIA announce the freely available Kubernetes on NVIDIA GPUs implementation, allowing developers to test and provide feedback on the tool’s ability to deploy containerized applications to multi-cloud GPU clusters and automate deployment, maintenance, scheduling and operations.

“With increasing number of AI powered applications and services and the broad availability of GPUs in public cloud, there is a need for open-source Kubernetes to be GPU-aware,” the company wrote in the announcement. “With Kubernetes on NVIDIA GPUs, software developers and DevOps engineers can build and deploy GPU-accelerated deep learning training or inference applications to heterogeneous GPU clusters at scale, seamlessly.”

TensorRT 4

The last piece of today’s deep-learning news from NVIDIA is the release of TensorRT 4, which adds new Recurrent Neural Network layers, Multilayer Perceptron, a native ONNX parser and integration with TensorFlow to the inference optimizer and runtime engine.

“Additional features include the ability to execute custom neural network layers using FP16 precision and support for the Xavier SoC through NVIDIA DRIVE AI platforms,” Siddharth Sharma, senior technical marketing manager for accelerated computing at NVIDIA, and Chris Gottbrah, accelerated computing software product manager at NVIDIA, wrote in a developer blog. “TensorRT 4 speeds up deep learning inference applications such as neural machine translation, recommender systems,  speech and image processing applications on GPUs. We measured speedups of 45x to 190x across these application areas.”

TensorRT 4 is available to all members of the NVIDIA Registered Developer Program for free, and much more in-depth information about the update’s new features can be found in the developers’ blog post.