The benefits of machine learning (ML) are becoming increasingly clear in virtually all fields of research and business. There is an increasing array of tools that are becoming available to help people move in the right direction – though hang-ups can, and do exist, this guide strives to allow practitioners to find their footing on AWS utilizing the PyTorch tool specifically.
From data collection, cleaning, and analysis – the amount of work required to prepare data for an ML model is very extensive. Getting there is no easy feat, and once you have it ready, getting from data to deployed models can seem like rocket science. For many data scientists, it can feel like doing extensive planning for a big trip, getting every detail in order and ready, and then showing up at the airport and being escorted to the cockpit to fly the plane. Where do you even begin?
While many ML models are run on machines on premises, not everyone has access to capable workstations that can crunch large amounts of data in acceptable time frames. Many researchers are turning to AWS with NVIDIA GPU capable instances to run their workloads. But logging into these systems can be confusing for people who are new to AWS and aren’t sure where to begin.
Deployment can be incredibly challenging, but like any skill, having a good guide can help show you the right path and give you the real-world experience so you can maximize your efficiencies. At Six Nines, we’ve developed a guide to help practitioners who are just starting out to understand the decision-making processes needed to get their data models from concept to a working ML training deployment, and then scale those deployments into clusters. The guide, titled “Getting started with a ML training model using AWS & PyTorch,” helps walk practitioners through decisions around which AWS instances are right for the ML model they’re trying to train, and what steps to take to get started. Beginners just starting, up to skilled practitioners who are looking for a shortcut to getting their models into the right cloud environment can benefit from this tutorial.
The guide examines three of the major machine learning instance types using NVIDIA GPUs available through AWS, from single GPU to multi-GPU deployments. These include the following:
- Amazon EC2 G4 Instances – The G4 instances are the most cost-effective instance for small scale training and inferencing. Great for early proof-of-concept and situations where time sensitivity is not a limiting factor.
- Amazon EC2 P3 Instances – Accelerate your machine learning with high performance computing in the cloud using the P3 Instances. Use these instances to speed up your training and iteration time so that you can do more with your ML models.
- Amazon EC2 P3dn Instances – Explore larger and more complex machine learning algorithms with twice the power of the P3 Instances. Choose this instance when you are ready for fast turn-around on your model training, or when you have needs for distributed ML training.
Once you’ve selected the instance that is right for your purpose, the guide provides walkthroughs of specific training models to help give you some direction on the steps that need to be taken to work with the most popular types of ML applications.
These include:
- Training a ResNet-50 ImageNet Model using PyTorch on a single AWS G4 or P3 Instance
- Training a ResNet-50 ImageNet Model using PyTorch on multiple AWS G4 or P3 Instances
- Training a BERT Fine Tuning Model using PyTorchon a single single AWS P3 Instance
- Training a BERT Fine Tuning Model using PyTorch on multiple AWS P3 Instances
- Object Detection Training using mask-R-cnn on AWS P3dn instances
Machine Learning is becoming a critical tool for organizations of all types, but one of the most challenging things is to know where to start. There are a lot of considerations and factors to manage when deploying a machine learning model – or a fleet of machine learning models. Six Nines is glad to help by providing resources, and even man-power for getting it done.
To download the “Getting started with a ML training model using AWS & PyTorch” guide, please click the link. Please feel free to use the page as a resource for feedback and conversation with our community about your process, and anything that can be done to help you along.