Example Overview
Examples
Introduction
The examples should work in any of the following settings (with the same script):
single GPU
multi GPUS (using PyTorch distributed mode)
multi GPUS (using DeepSpeed ZeRO-Offload stages 1, 2, & 3)
fp16 (mixed-precision), fp32 (normal precision), or bf16 (bfloat16 precision)
To run it in each of these various modes, first initialize the accelerate configuration with accelerate config
NOTE to train with a 4-bit or 8-bit model, please run
Copied
Accelerate Config
For all the examples, youβll need to generate a π Accelerate config file with:
Copied
Then, it is encouraged to launch jobs with accelerate launch
!
Maintained Examples
This script shows how to use the RewardTrainer
to train a reward model on your own dataset.
This script shows how to use the PPOTrainer
to fine-tune a sentiment analysis model using IMDB dataset
This script shows how to use the PPOTrainer
to train a single base model with multiple adapters. Requires you to run the example script with the reward model training beforehand.
This script shows to use DDPOTrainer to fine-tune a stable diffusion model using reinforcement learning.
Here are also some easier-to-run colab notebooks that you can use to get started with TRL:
This notebook demonstrates how to use the βBest of Nβ sampling strategy using TRL when fine-tuning your model with PPO.
This notebook demonstrates how to reproduce the GPT2 imdb sentiment tuning example on a jupyter notebook.
This notebook demonstrates how to reproduce the GPT2 sentiment control example on a jupyter notebook.
We also have some other examples that are less maintained but can be used as a reference:
research_projects: Check out this folder to find the scripts used for some research projects that used TRL (LM de-toxification, Stack-Llama, etc.)
Distributed training
All of the scripts can be run on multiple GPUs by providing the path of an π Accelerate config file when calling accelerate launch
. To launch one of them on one or multiple GPUs, run the following command (swapping {NUM_GPUS}
with the number of GPUs in your machine and --all_arguments_of_the_script
with your arguments.)
Copied
You can also adjust the parameters of the π Accelerate config file to suit your needs (e.g. training in mixed precision).
Distributed training with DeepSpeed
Most of the scripts can be run on multiple GPUs together with DeepSpeed ZeRO-{1,2,3} for efficient sharding of the optimizer states, gradients, and model weights. To do so, run following command (swapping {NUM_GPUS}
with the number of GPUs in your machine, --all_arguments_of_the_script
with your arguments, and --deepspeed_config
with the path to the DeepSpeed config file such as examples/deepspeed_configs/deepspeed_zero1.yaml
):
Copied
Last updated