Fine-tune BERT for Text Classification on AWS Trainium
Fine-tune BERT for Text Classification on AWS Trainium
This tutorial will help you to get started with AWS Trainium and BOINC AI Transformers. It will cover how to set up a Trainium instance on AWS, load & fine-tune a transformers model for text-classification
You will learn how to:
Before we can start, make sure you have a BOINC AI Account to save artifacts and experiments.
Quick intro: AWS Trainium
AWS Trainium (Trn1) is a purpose-built EC2 for deep learning (DL) training workloads. Trainium is the successor of AWS Inferentia focused on high-performance training workloads claiming up to 50% cost-to-train savings over comparable GPU-based instances.
Trainium has been optimized for training natural language processing, computer vision, and recommender models used. The accelerator supports a wide range of data types, including FP32, TF32, BF16, FP16, UINT8, and configurable FP8.
The biggest Trainium instance, the trn1.32xlarge
comes with over 500GB of memory, making it easy to fine-tune ~10B parameter models on a single instance. Below you will find an overview of the available instance types. More details here:
trn1.2xlarge
1
32
8
32
$1.34
trn1.32xlarge
16
512
128
512
$21.50
trn1n.32xlarge (2x bandwidth)
16
512
128
512
$24.78
Now we know what Trainium offers, letβs get started. π
Note: This tutorial was created on a trn1.2xlarge AWS EC2 Instance.
1. Setup AWS environment
In this example, we will use the trn1.2xlarge
instance on AWS with 1 Accelerator, including two Neuron Cores and the BOINC AI Neuron Deep Learning AMI.
This blog post doesnβt cover how to create the instance in detail. You can check out my previous blog about βSetting up AWS Trainium for BOINC AI Transformersβ, which includes a step-by-step guide on setting up the environment.
Once the instance is up and running, we can ssh into it. But instead of developing inside a terminal we want to use a Jupyter
environment, which we can use for preparing our dataset and launching the training. For this, we need to add a port for forwarding in the ssh
command, which will tunnel our localhost traffic to the Trainium instance.
Copied
We can now start our jupyter
server.
Copied
You should see a familiar jupyter
output with a URL to the notebook.
http://localhost:8080/?token=8c1739aff1755bd7958c4cfccc8d08cb5da5234f61f129a9
We can click on it, and a jupyter
environment opens in our local browser.
We are going to use the Jupyter environment only for preparing the dataset and then torchrun
for launching our training script on both neuron cores for distributed training. Lets create a new notebook and get started.
2. Load and process the dataset
We are training a Text Classification model on the emotion dataset to keep the example straightforward. The emotion
is a dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise.
We will use the load_dataset()
method from the π Datasets library to load the emotion
.
Copied
Letβs check out an example of the dataset.
Copied
We must convert our βNatural Languageβ to token IDs to train our model. This is done by a Tokenizer, which tokenizes the inputs (including converting the tokens to their corresponding IDs in the pre-trained vocabulary). if you want to learn more about this, out chapter 6 of the BOINC AI Course.
Our Neuron Accelerator expects a fixed shape of inputs. We need to truncate or pad all samples to the same length.
Copied
3. Fine-tune BERT using BOINC AI Transformers
Normally you would use the Trainer and TrainingArguments to fine-tune PyTorch-based transformer models.
But together with AWS, we have developed a NeuronTrainer
to improve performance, robustness, and safety when training on Trainium instances. The NeuronTrainer
also comes with a model cache, which allows us to use precompiled models and configuration from BOINC AI Hub to skip the compilation step, which would be needed at the beginning of training. This can reduce the training time by ~3x.
The NeuronTrainer
is part of the optimum-neuron
library and can be used as a 1-to-1 replacement for the Trainer
. You only have to adjust the import in your training script.
Copied
We prepared a simple train.py training script based on the βGetting started with Pytorch 2.0 and BOINC AI Transformersβ blog post with the NeuronTrainer
. Below is an excerpt
Copied
We can load the training script into our environment using the wget
command or manually copy it into the notebook from here.
Copied
We will use torchrun
to launch our training script on both neuron cores for distributed training. torchrun
is a tool that automatically distributes a PyTorch model across multiple accelerators. We can pass the number of accelerators as nproc_per_node
arguments alongside our hyperparameters.
Weβll use the following command to launch training:
Copied
Note: If you see bad, bad accuracy, you might want to deactivate bf16
for now.
After 9 minutes the training was completed and achieved an excellent f1 score of 0.914
.
Copied
Last but not least, terminate the EC2 instance to avoid unnecessary charges. Looking at the price-performance, our training only cost 20ct
(1.34$/h * 0.15h = 0.20$
)
Last updated