Main Accelerator class

Accelerator

The Accelerator is the main class provided by 🌍 Accelerate. It serves at the main entry point for the API.

Quick adaptation of your code

To quickly adapt your script to work on any kind of setup with 🌍 Accelerate just:

Initialize an Accelerator object (that we will call accelerator throughout this page) as early as possible in your script.
Pass your dataloader(s), model(s), optimizer(s), and scheduler(s) to the prepare() method.
Remove all the .cuda() or .to(device) from your code and let the accelerator handle the device placement for you.

Step three is optional, but considered a best practice.

Replace loss.backward() in your code with accelerator.backward(loss)
Gather your predictions and labels before storing them or using them for metric computation using gather()

Step five is mandatory when using distributed evaluation

In most cases this is all that is needed. The next section lists a few more advanced use cases and nice features you should search for and replace by the corresponding methods of your accelerator:

Advanced recommendations

Printing

print statements should be replaced by print() to be printed once per process:

Copied

- print("My thing I want to print!")
+ accelerator.print("My thing I want to print!")

Executing processes

Once on a single server

For statements that should be executed once per server, use is_local_main_process:

Copied

if accelerator.is_local_main_process:
    do_thing_once_per_server()

A function can be wrapped using the on_local_main_process() function to achieve the same behavior on a function’s execution:

Copied

@accelerator.on_local_main_process
def do_my_thing():
    "Something done once per server"
    do_thing_once_per_server()

Only ever once across all servers

For statements that should only ever be executed once, use is_main_process:

Copied

if accelerator.is_main_process:
    do_thing_once()

A function can be wrapped using the on_main_process() function to achieve the same behavior on a function’s execution:

Copied

@accelerator.on_main_process
def do_my_thing():
    "Something done once per server"
    do_thing_once()

On specific processes

If a function should be ran on a specific overall or local process index, there are similar decorators to achieve this:

Copied

@accelerator.on_local_process(local_process_idx=0)
def do_my_thing():
    "Something done on process index 0 on each server"
    do_thing_on_index_zero_on_each_server()

Copied

@accelerator.on_process(process_index=0)
def do_my_thing():
    "Something done on process index 0"
    do_thing_on_index_zero()

Synchronicity control

Use wait_for_everyone() to make sure all processes join that point before continuing. (Useful before a model save for instance).

Saving and loading

Copied

model = MyModel()
model = accelerator.prepare(model)

Use save_model() instead of torch.save to save a model. It will remove all model wrappers added during the distributed process, get the state_dict of the model and save it. The state_dict will be in the same precision as the model being trained.

Copied

- torch.save(state_dict, "my_state.pkl")
+ accelerator.save_model(model, save_directory)

save_model() can also save a model into sharded checkpoints or with safetensors format. Here is an example:

Copied

accelerator.save_model(model, save_directory, max_shard_size="1GB", safe_serialization=True)

🌍 Transformers models

If you are using models from the 🌍 Transformers library, you can use the .save_pretrained() method.

Copied

from transformers import AutoModel

model = AutoModel.from_pretrained("bert-base-cased")
model = accelerator.prepare(model)

# ...fine-tune with PyTorch...

unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(
    "path/to/my_model_directory",
    is_main_process=accelerator.is_main_process,
    save_function=accelerator.save,
)

This will ensure your model stays compatible with other 🌍 Transformers functionality like the .from_pretrained() method.

Copied

from transformers import AutoModel

model = AutoModel.from_pretrained("path/to/my_model_directory")

Operations

Use clipgrad_norm() instead of torch.nn.utils.clip_grad_norm_ and clipgrad_value() instead of torch.nn.utils.clip_grad_value

Gradient Accumulation

To perform gradient accumulation use accumulate() and specify a gradient_accumulation_steps. This will also automatically ensure the gradients are synced or unsynced when on multi-device training, check if the step should actually be performed, and auto-scale the loss:

Copied

- accelerator = Accelerator()
+ accelerator = Accelerator(gradient_accumulation_steps=2)

  for (input, label) in training_dataloader:
+     with accelerator.accumulate(model):
          predictions = model(input)
          loss = loss_function(predictions, labels)
          accelerator.backward(loss)
          optimizer.step()
          scheduler.step()
          optimizer.zero_grad()

GradientAccumulationPlugin

Accelerator

Quick adaptation of your code

Advanced recommendations

Printing

Executing processes

Synchronicity control

Saving and loading

Operations

Gradient Accumulation

class accelerate.utils.GradientAccumulationPlugin

Overall API documentation:

class accelerate.Accelerator