Train with a script
Train with a script
Along with the 🌎 Transformers notebooks, there are also example scripts demonstrating how to train a model for a task with PyTorch, TensorFlow, or JAX/Flax.
You will also find scripts we’ve used in our research projects and legacy examples which are mostly community contributed. These scripts are not actively maintained and require a specific version of 🌎 Transformers that will most likely be incompatible with the latest version of the library.
The example scripts are not expected to work out-of-the-box on every problem, and you may need to adapt the script to the problem you’re trying to solve. To help you with this, most of the scripts fully expose how data is preprocessed, allowing you to edit it as necessary for your use case.
For any feature you’d like to implement in an example script, please discuss it on the forum or in an issue before submitting a Pull Request. While we welcome bug fixes, it is unlikely we will merge a Pull Request that adds more functionality at the cost of readability.
This guide will show you how to run an example summarization training script in PyTorch and TensorFlow. All examples are expected to work with both frameworks unless otherwise specified.
Setup
To successfully run the latest version of the example scripts, you have to install 🌎Transformers from source in a new virtual environment:
Copied
For older versions of the example scripts, click on the toggle below:
Then switch your current clone of 🌎 Transformers to a specific version, like v3.5.1 for example:
Copied
After you’ve setup the correct library version, navigate to the example folder of your choice and install the example specific requirements:
Copied
Run a script
PytorchHide Pytorch content
The example script downloads and preprocesses a dataset from the 🌎 Datasets library. Then the script fine-tunes a dataset with the Trainer on an architecture that supports summarization. The following example shows how to fine-tune T5-small on the CNN/DailyMail dataset. The T5 model requires an additional source_prefix
argument due to how it was trained. This prompt lets T5 know this is a summarization task.
Copied
TensorFlowHide TensorFlow content
The example script downloads and preprocesses a dataset from the 🌎 Datasets library. Then the script fine-tunes a dataset using Keras on an architecture that supports summarization. The following example shows how to fine-tune T5-small on the CNN/DailyMail dataset. The T5 model requires an additional source_prefix
argument due to how it was trained. This prompt lets T5 know this is a summarization task.
Copied
Distributed training and mixed precision
The Trainer supports distributed training and mixed precision, which means you can also use it in a script. To enable both of these features:
Add the
fp16
argument to enable mixed precision.Set the number of GPUs to use with the
nproc_per_node
argument.
Copied
TensorFlow scripts utilize a MirroredStrategy
for distributed training, and you don’t need to add any additional arguments to the training script. The TensorFlow script will use multiple GPUs by default if they are available.
Run a script on a TPU
PytorchHide Pytorch content
Tensor Processing Units (TPUs) are specifically designed to accelerate performance. PyTorch supports TPUs with the XLA deep learning compiler (see here for more details). To use a TPU, launch the xla_spawn.py
script and use the num_cores
argument to set the number of TPU cores you want to use.
Copied
TensorFlowHide TensorFlow content
Tensor Processing Units (TPUs) are specifically designed to accelerate performance. TensorFlow scripts utilize a TPUStrategy
for training on TPUs. To use a TPU, pass the name of the TPU resource to the tpu
argument.
Copied
Run a script with 🌎 Accelerate
🌎 Accelerate is a PyTorch-only library that offers a unified method for training a model on several types of setups (CPU-only, multiple GPUs, TPUs) while maintaining complete visibility into the PyTorch training loop. Make sure you have 🌎 Accelerate installed if you don’t already have it:
Note: As Accelerate is rapidly developing, the git version of accelerate must be installed to run the scripts
Copied
Instead of the run_summarization.py
script, you need to use the run_summarization_no_trainer.py
script. 🌎 Accelerate supported scripts will have a task_no_trainer.py
file in the folder. Begin by running the following command to create and save a configuration file:
Copied
Test your setup to make sure it is configured correctly:
Copied
Now you are ready to launch the training:
Copied
Use a custom dataset
The summarization script supports custom datasets as long as they are a CSV or JSON Line file. When you use your own dataset, you need to specify several additional arguments:
train_file
andvalidation_file
specify the path to your training and validation files.text_column
is the input text to summarize.summary_column
is the target text to output.
A summarization script using a custom dataset would look like this:
Copied
Test a script
It is often a good idea to run your script on a smaller number of dataset examples to ensure everything works as expected before committing to an entire dataset which may take hours to complete. Use the following arguments to truncate the dataset to a maximum number of samples:
max_train_samples
max_eval_samples
max_predict_samples
Copied
Not all example scripts support the max_predict_samples
argument. If you aren’t sure whether your script supports this argument, add the -h
argument to check:
Copied
Resume training from checkpoint
Another helpful option to enable is resuming training from a previous checkpoint. This will ensure you can pick up where you left off without starting over if your training gets interrupted. There are two methods to resume training from a checkpoint.
The first method uses the output_dir previous_output_dir
argument to resume training from the latest checkpoint stored in output_dir
. In this case, you should remove overwrite_output_dir
:
Copied
The second method uses the resume_from_checkpoint path_to_specific_checkpoint
argument to resume training from a specific checkpoint folder.
Copied
Share your model
All scripts can upload your final model to the Model Hub. Make sure you are logged into BOINC AI before you begin:
Copied
Then add the push_to_hub
argument to the script. This argument will create a repository with your BOINC AI username and the folder name specified in output_dir
.
To give your repository a specific name, use the push_to_hub_model_id
argument to add it. The repository will be automatically listed under your namespace.
The following example shows how to upload a model with a specific repository name:
Copied
Last updated