AWS Lowers the Cost Barrier to Machine Learning Training with Trainium Chip

The company announced this week at its re-Invent conference that Trainium, an Amazon Web Services (AWS), custom machine learning (ML), processor, is set for release in 2021.
Andy Jassy (AWS CEO) introduced Trainium during his virtual re-Invent keynote. Trainium is a cost-effective alternative for cloud-based ML modeling training. He stated that “We know we want to continue pushing the price performance for machine learning training,” so we will have to invest in our chips. “You have an unmatched number of instances in AWS, combined with innovation in chips.”
Jassy stated that the Trainium chip was created to offer the best performance and the most teraflops of compute power (TFLOPS), for ML in cloud. This will allow for a wider range of ML applications. The chip is optimized for deep learning workloads, such as image classification, semantic search and voice recognition.
Trainium is AWS’ second piece of custom-made, in-house silicon. Its predecessor, Inferentia was launched two years ago. Recently, the company announced plans to move some Alexa- and facial recognition computing to Inferentia chips.
Inferentia enables upto 30% higher throughput and upto 45 percent lower cost per inference than Amazon EC2G4 instances. These instances were already the lowest-cost instances of ML inference in cloud.
“While Inferentia addressed cost of inference, which can account for up to 90% of ML infrastructure expenses, many development teams are also restricted by fixed ML learning budgets,” AWS states on its Trainium product site. This limits the amount and frequency of training required to improve their models or applications. AWS Trainium solves this problem by offering the best ML training in the cloud at the lowest cost and performance.
Combining Inferentia and Trainium creates a seamless flow of ML compute, “from scaling training workloads to deploying accelerated Inference,” the company states.
Trainium and Inferentia use the same AWS Neuron SDK. This makes it easy for developers who are already proficient in Inferentia to start with Trainium. Developers can easily migrate to AWS Trainium via GPU-based instances by integrating the Neuron SDK with popular ML frameworks such as TensorFlow and PyTorch with minimal code changes.
AWS Trainium will be made available via Amazon EC2 instances, AWS Deep Learning AMIs and managed services such as Amazon SageMaker and Amazon ECS, EKS, and AWS Batch.

Author: Victoria