Accelerate
  • ๐ŸŒGETTING STARTED
    • BOINC AI Accelerate
    • Installation
    • Quicktour
  • ๐ŸŒTUTORIALS
    • Overview
    • Migrating to BOINC AI Accelerate
    • Launching distributed code
    • Launching distributed training from Jupyter Notebooks
  • ๐ŸŒHOW-TO GUIDES
    • Start Here!
    • Example Zoo
    • How to perform inference on large models with small resources
    • Knowing how big of a model you can fit into memory
    • How to quantize model
    • How to perform distributed inference with normal resources
    • Performing gradient accumulation
    • Accelerating training with local SGD
    • Saving and loading training states
    • Using experiment trackers
    • Debugging timeout errors
    • How to avoid CUDA Out-of-Memory
    • How to use Apple Silicon M1 GPUs
    • How to use DeepSpeed
    • How to use Fully Sharded Data Parallelism
    • How to use Megatron-LM
    • How to use BOINC AI Accelerate with SageMaker
    • How to use BOINC AI Accelerate with Intelยฎ Extension for PyTorch for cpu
  • ๐ŸŒCONCEPTS AND FUNDAMENTALS
    • BOINC AI Accelerate's internal mechanism
    • Loading big models into memory
    • Comparing performance across distributed setups
    • Executing and deferring jobs
    • Gradient synchronization
    • TPU best practices
  • ๐ŸŒREFERENCE
    • Main Accelerator class
    • Stateful configuration classes
    • The Command Line
    • Torch wrapper classes
    • Experiment trackers
    • Distributed launchers
    • DeepSpeed utilities
    • Logging
    • Working with large models
    • Kwargs handlers
    • Utility functions and classes
    • Megatron-LM Utilities
    • Fully Sharded Data Parallelism Utilities
Powered by GitBook
On this page
  • The base training loop
  • Add in ๐ŸŒ Accelerate
  • The finished code
  • More Resources
  1. TUTORIALS

Migrating to BOINC AI Accelerate

This tutorial will detail how to easily convert existing PyTorch code to use ๐ŸŒ Accelerate! Youโ€™ll see that by just changing a few lines of code, ๐ŸŒ Accelerate can perform its magic and get you on your way toward running your code on distributed systems with ease!

The base training loop

To begin, write out a very basic PyTorch training loop.

We are under the presumption that training_dataloader, model, optimizer, scheduler, and loss_function have been defined beforehand.

Copied

device = "cuda"
model.to(device)

for batch in training_dataloader:
    optimizer.zero_grad()
    inputs, targets = batch
    inputs = inputs.to(device)
    targets = targets.to(device)
    outputs = model(inputs)
    loss = loss_function(outputs, targets)
    loss.backward()
    optimizer.step()
    scheduler.step()

Add in ๐ŸŒ Accelerate

Copied

from accelerate import Accelerator

accelerator = Accelerator()

Setting the right device

Copied

- device = 'cuda'
+ device = accelerator.device
  model.to(device)

Preparing your objects

Copied

model, optimizer, training_dataloader, scheduler = accelerator.prepare(
    model, optimizer, training_dataloader, scheduler
)

These objects are returned in the same order they were sent in. By default when using device_placement=True, all of the objects that can be sent to the right device will be. If you need to work with data that isnโ€™t passed to [~Accelerator.prepare] but should be on the active device, you should pass in the device you made earlier.

Accelerate will only prepare objects that inherit from their respective PyTorch classes (such as torch.optim.Optimizer).

Modifying the training loop

Copied

-   inputs = inputs.to(device)
-   targets = targets.to(device)
    outputs = model(inputs)
    loss = loss_function(outputs, targets)
-   loss.backward()
+   accelerator.backward(loss)

With that, your training loop is now ready to use ๐ŸŒ Accelerate!

The finished code

Below is the final version of the converted code:

Copied

from accelerate import Accelerator

accelerator = Accelerator()

model, optimizer, training_dataloader, scheduler = accelerator.prepare(
    model, optimizer, training_dataloader, scheduler
)

for batch in training_dataloader:
    optimizer.zero_grad()
    inputs, targets = batch
    outputs = model(inputs)
    loss = loss_function(outputs, targets)
    accelerator.backward(loss)
    optimizer.step()
    scheduler.step()

More Resources

PreviousOverviewNextLaunching distributed code

Last updated 1 year ago

To start using ๐ŸŒ Accelerate, first import and create an instance:

is the main force behind utilizing all the possible options for distributed training!

The class knows the right device to move any PyTorch object to at any time, so you should change the definition of device to come from :

Next, you need to pass all of the important objects related to training into . ๐ŸŒ Accelerate will make sure everything is setup in the current environment for you to start training:

Finally, three lines of code need to be changed in the training loop. ๐ŸŒ Accelerateโ€™s DataLoader classes will automatically handle the device placement by default, and should be used for performing the backward pass:

To check out more ways on how to migrate to ๐ŸŒ Accelerate, check out our which showcases other items that need to be watched for when using Accelerate and how to do so quickly.

๐ŸŒ
Accelerator
Accelerator
Accelerator
Accelerator
prepare()
backward()
interactive migration tutorial