Main Accelerator class
Accelerator
The Accelerator is the main class provided by ๐ Accelerate. It serves at the main entry point for the API.
Quick adaptation of your code
To quickly adapt your script to work on any kind of setup with ๐ Accelerate just:
Initialize an Accelerator object (that we will call
accelerator
throughout this page) as early as possible in your script.Pass your dataloader(s), model(s), optimizer(s), and scheduler(s) to the prepare() method.
Remove all the
.cuda()
or.to(device)
from your code and let theaccelerator
handle the device placement for you.
Step three is optional, but considered a best practice.
Replace
loss.backward()
in your code withaccelerator.backward(loss)
Gather your predictions and labels before storing them or using them for metric computation using gather()
Step five is mandatory when using distributed evaluation
In most cases this is all that is needed. The next section lists a few more advanced use cases and nice features you should search for and replace by the corresponding methods of your accelerator
:
Advanced recommendations
Printing
print
statements should be replaced by print() to be printed once per process:
Copied
Executing processes
Once on a single server
For statements that should be executed once per server, use is_local_main_process
:
Copied
A function can be wrapped using the on_local_main_process() function to achieve the same behavior on a functionโs execution:
Copied
Only ever once across all servers
For statements that should only ever be executed once, use is_main_process
:
Copied
A function can be wrapped using the on_main_process() function to achieve the same behavior on a functionโs execution:
Copied
On specific processes
If a function should be ran on a specific overall or local process index, there are similar decorators to achieve this:
Copied
Copied
Synchronicity control
Use wait_for_everyone() to make sure all processes join that point before continuing. (Useful before a model save for instance).
Saving and loading
Copied
Use save_model() instead of torch.save
to save a model. It will remove all model wrappers added during the distributed process, get the state_dict of the model and save it. The state_dict will be in the same precision as the model being trained.
Copied
save_model() can also save a model into sharded checkpoints or with safetensors format. Here is an example:
Copied
๐ Transformers models
If you are using models from the ๐ Transformers library, you can use the .save_pretrained()
method.
Copied
This will ensure your model stays compatible with other ๐ Transformers functionality like the .from_pretrained()
method.
Copied
Operations
Use clipgrad_norm() instead of torch.nn.utils.clip_grad_norm_
and clipgrad_value() instead of torch.nn.utils.clip_grad_value
Gradient Accumulation
To perform gradient accumulation use accumulate() and specify a gradient_accumulation_steps. This will also automatically ensure the gradients are synced or unsynced when on multi-device training, check if the step should actually be performed, and auto-scale the loss:
Copied
GradientAccumulationPlugin
class accelerate.utils.GradientAccumulationPlugin
( num_steps: int = Noneadjust_scheduler: bool = Truesync_with_dataloader: bool = True )
A plugin to configure gradient accumulation behavior.
Instead of passing gradient_accumulation_steps
you can instantiate a GradientAccumulationPlugin and pass it to the Acceleratorโs __init__
as gradient_accumulation_plugin
. You can only pass either one of gradient_accumulation_plugin
or gradient_accumulation_steps
passing both will raise an error.
Copied
In addition to the number of steps, this also lets you configure whether or not you adjust your learning rate scheduler to account for the change in steps due to accumulation.
Overall API documentation:
class accelerate.Accelerator
( device_placement: bool = Truesplit_batches: bool = Falsemixed_precision: PrecisionType | str | None = Nonegradient_accumulation_steps: int = 1cpu: bool = Falsedeepspeed_plugin: DeepSpeedPlugin | None = Nonefsdp_plugin: FullyShardedDataParallelPlugin | None = Nonemegatron_lm_plugin: MegatronLMPlugin | None = Nonerng_types: list[str | RNGType] | None = Nonelog_with: str | LoggerType | GeneralTracker | list[str | LoggerType | GeneralTracker] | None = Noneproject_dir: str | os.PathLike | None = Noneproject_config: ProjectConfiguration | None = Nonegradient_accumulation_plugin: GradientAccumulationPlugin | None = Nonedispatch_batches: bool | None = Noneeven_batches: bool = Truestep_scheduler_with_optimizer: bool = Truekwargs_handlers: list[KwargsHandler] | None = Nonedynamo_backend: DynamoBackend | str | None = None )
Parameters
device_placement (
bool
, optional, defaults toTrue
) โ Whether or not the accelerator should put objects on device (tensors yielded by the dataloader, model, etcโฆ).split_batches (
bool
, optional, defaults toFalse
) โ Whether or not the accelerator should split the batches yielded by the dataloaders across the devices. IfTrue
the actual batch size used will be the same on any kind of distributed processes, but it must be a round multiple of thenum_processes
you are using. IfFalse
, actual batch size used will be the one set in your script multiplied by the number of processes.mixed_precision (
str
, optional) โ Whether or not to use mixed precision training. Choose from โnoโ,โfp16โ,โbf16 or โfp8โ. Will default to the value in the environment variableACCELERATE_MIXED_PRECISION
, which will use the default value in the accelerate config of the current system or the flag passed with theaccelerate.launch
command. โfp8โ requires the installation of transformers-engine.gradient_accumulation_steps (
int
, optional, default to 1) โ The number of steps that should pass before gradients are accumulated. A number > 1 should be combined withAccelerator.accumulate
. If not passed, will default to the value in the environment variableACCELERATE_GRADIENT_ACCUMULATION_STEPS
. Can also be configured through aGradientAccumulationPlugin
.cpu (
bool
, optional) โ Whether or not to force the script to execute on CPU. Will ignore GPU available if set toTrue
and force the execution on one process only.deepspeed_plugin (
DeepSpeedPlugin
, optional) โ Tweak your DeepSpeed related args using this argument. This argument is optional and can be configured directly using accelerate configfsdp_plugin (
FullyShardedDataParallelPlugin
, optional) โ Tweak your FSDP related args using this argument. This argument is optional and can be configured directly using accelerate configmegatron_lm_plugin (
MegatronLMPlugin
, optional) โ Tweak your MegatronLM related args using this argument. This argument is optional and can be configured directly using accelerate configrng_types (list of
str
orRNGType
) โ The list of random number generators to synchronize at the beginning of each iteration in your prepared dataloaders. Should be one or several of:"torch"
: the base torch random number generator"cuda"
: the CUDA random number generator (GPU only)"xla"
: the XLA random number generator (TPU only)"generator"
: thetorch.Generator
of the sampler (or batch sampler if there is no sampler in your dataloader) or of the iterable dataset (if it exists) if the underlying dataset is of that type.
Will default to
["torch"]
for PyTorch versions <=1.5.1 and["generator"]
for PyTorch versions >= 1.6.log_with (list of
str
, LoggerType or GeneralTracker, optional) โ A list of loggers to be setup for experiment tracking. Should be one or several of:"all"
"tensorboard"
"wandb"
"comet_ml"
If"all"
is selected, will pick up all available trackers in the environment and initialize them. Can also accept implementations ofGeneralTracker
for custom trackers, and can be combined with"all"
.
project_config (
ProjectConfiguration
, optional) โ A configuration for how saving the state can be handled.project_dir (
str
,os.PathLike
, optional) โ A path to a directory for storing data such as logs of locally-compatible loggers and potentially saved checkpoints.dispatch_batches (
bool
, optional) โ If set toTrue
, the dataloader prepared by the Accelerator is only iterated through on the main process and then the batches are split and broadcast to each process. Will default toTrue
forDataLoader
whose underlying dataset is anIterableDataset
,False
otherwise.even_batches (
bool
, optional, defaults toTrue
) โ If set toTrue
, in cases where the total batch size across all processes does not exactly divide the dataset, samples at the start of the dataset will be duplicated so the batch can be divided equally among all workers.step_scheduler_with_optimizer (
bool
, *optional, defaults to
True) -- Set
Trueif the learning rate scheduler is stepped at the same time as the optimizer,
False` if only done under certain circumstances (at the end of each epoch, for instance).kwargs_handlers (
list[KwargHandler]
, optional) โ A list ofKwargHandler
to customize how the objects related to distributed training or mixed precision are created. See kwargs for more information.dynamo_backend (
str
orDynamoBackend
, optional, defaults to"no"
) โ Set to one of the possible dynamo backends to optimize your training with torch dynamo.gradient_accumulation_plugin (
GradientAccumulationPlugin
, optional) โ A configuration for how gradient accumulation should be handled, if more tweaking than just thegradient_accumulation_steps
is needed.
Creates an instance of an accelerator for distributed training (on multi-GPU, TPU) or mixed precision training.
Available attributes:
device (
torch.device
) โ The device to use.distributed_type (DistributedType) โ The distributed training configuration.
local_process_index (
int
) โ The process index on the current machine.mixed_precision (
str
) โ The configured mixed precision mode.num_processes (
int
) โ The total number of processes used for training.optimizer_step_was_skipped (
bool
) โ Whether or not the optimizer update was skipped (because of gradient overflow in mixed precision), in which case the learning rate should not be changed.process_index (
int
) โ The overall index of the current process among all processes.state (AcceleratorState) โ The distributed setup state.
sync_gradients (
bool
) โ Whether the gradients are currently being synced across all processes.use_distributed (
bool
) โ Whether the current configuration is for distributed training.
accumulate
( *models )
Parameters
*models (list of
torch.nn.Module
) โ PyTorch Modules that was prepared withAccelerator.prepare
. Models passed toaccumulate()
will skip gradient syncing during backward pass in distributed training
A context manager that will lightly wrap around and perform gradient accumulation automatically
Example:
Copied
autocast
( cache_enabled: bool = Falseautocast_handler: AutocastKwargs = None )
Will apply automatic mixed-precision inside the block inside this context manager, if it is enabled. Nothing different will happen otherwise.
A different autocast_handler
can be passed in to override the one set in the Accelerator
object. This is useful in blocks under autocast
where you want to revert to fp32.
Example:
Copied
backward
( loss**kwargs )
Scales the gradients in accordance to the GradientAccumulationPlugin
and calls the correct backward()
based on the configuration.
Should be used in lieu of loss.backward()
.
Example:
Copied
check_trigger
( )
Checks if the internal trigger tensor has been set to 1 in any of the processes. If so, will return True
and reset the trigger tensor to 0.
Note: Does not require wait_for_everyone()
Example:
Copied
clear
( )
Alias for Accelerate.free_memory
, releases all references to the internal objects stored and call the garbage collector. You should call this method between two trainings with different models/optimizers.
Example:
Copied
clip_grad_norm_
( parametersmax_normnorm_type = 2 ) โ torch.Tensor
Returns
torch.Tensor
Total norm of the parameter gradients (viewed as a single vector).
Should be used in place of torch.nn.utils.clip_grad_norm_
.
Example:
Copied
clip_grad_value_
( parametersclip_value )
Should be used in place of torch.nn.utils.clip_grad_value_
.
Example:
Copied
free_memory
( )
Will release all references to the internal objects stored and call the garbage collector. You should call this method between two trainings with different models/optimizers. Also will reset Accelerator.step
to 0.
Example:
Copied
gather
( tensor ) โ torch.Tensor
, or a nested tuple/list/dictionary of torch.Tensor
Parameters
tensor (
torch.Tensor
, or a nested tuple/list/dictionary oftorch.Tensor
) โ The tensors to gather across all processes.
Returns
torch.Tensor
, or a nested tuple/list/dictionary of torch.Tensor
The gathered tensor(s). Note that the first dimension of the result is num_processes multiplied by the first dimension of the input tensors.
Gather the values in tensor across all processes and concatenate them on the first dimension. Useful to regroup the predictions from all processes when doing evaluation.
Note: This gather happens in all processes.
Example:
Copied
gather_for_metrics
( input_data )
Parameters
input (
torch.Tensor
,object
, a nested tuple/list/dictionary oftorch.Tensor
, or a nested tuple/list/dictionary ofobject
) โ The tensors or objects for calculating metrics across all processes
Gathers input_data
and potentially drops duplicates in the last batch if on a distributed system. Should be used for gathering the inputs and targets for metric calculation.
Example:
Copied
get_state_dict
( modelunwrap = True ) โ dict
Parameters
model (
torch.nn.Module
) โ A PyTorch model sent through Accelerator.prepare()unwrap (
bool
, optional, defaults toTrue
) โ Whether to return the original underlying state_dict ofmodel
or to return the wrapped state_dict
Returns
dict
The state dictionary of the model potentially without full precision.
Returns the state dictionary of a model sent through Accelerator.prepare() potentially without full precision.
Example:
Copied
get_tracker
( name: strunwrap: bool = False ) โ GeneralTracker
Parameters
name (
str
) โ The name of a tracker, corresponding to the.name
property.unwrap (
bool
) โ Whether to return the internal tracking mechanism or to return the wrapped tracker instead (recommended).
Returns
GeneralTracker
The tracker corresponding to name
if it exists.
Returns a tracker
from self.trackers
based on name
on the main process only.
Example:
Copied
join_uneven_inputs
( joinableseven_batches = None )
Parameters
joinables (
list[torch.distributed.algorithms.Joinable]
) โ A list of models or optimizers that subclasstorch.distributed.algorithms.Joinable
. Most commonly, a PyTorch Module that was prepared withAccelerator.prepare
for DistributedDataParallel training.even_batches (
bool
, optional) โ If set, this will override the value ofeven_batches
set in theAccelerator
. If it is not provided, the defaultAccelerator
value wil be used.
A context manager that facilitates distributed training or evaluation on uneven inputs, which acts as a wrapper around torch.distributed.algorithms.join
. This is useful when the total batch size does not evenly divide the length of the dataset.
join_uneven_inputs
is only supported for Distributed Data Parallel training on multiple GPUs. For any other configuration, this method will have no effect.
Overidding even_batches
will not affect iterable-style data loaders.
Example:
Copied
load_state
( input_dir: str = None**load_model_func_kwargs )
Parameters
input_dir (
str
oros.PathLike
) โ The name of the folder all relevant weights and states were saved in. Can beNone
ifautomatic_checkpoint_naming
is used, and will pick up from the latest checkpoint.load_model_func_kwargs (
dict
, optional) โ Additional keyword arguments for loading model which can be passed to the underlying load function, such as optional arguments for DeepSpeedโsload_checkpoint
function or amap_location
to load the model and optimizer on.
Loads the current states of the model, optimizer, scaler, RNG generators, and registered objects.
Should only be used in conjunction with Accelerator.save_state(). If a file is not registered for checkpointing, it will not be loaded if stored in the directory.
Example:
Copied
local_main_process_first
( )
Lets the local main process go inside a with block.
The other processes will enter the with block after the main process exits.
Example:
Copied
main_process_first
( )
Lets the main process go first inside a with block.
The other processes will enter the with block after the main process exits.
Example:
Copied
no_sync
( model )
Parameters
model (
torch.nn.Module
) โ PyTorch Module that was prepared withAccelerator.prepare
A context manager to disable gradient synchronizations across DDP processes by calling torch.nn.parallel.DistributedDataParallel.no_sync
.
If model
is not in DDP, this context manager does nothing
Example:
Copied
on_last_process
( function: Callable[..., Any] )
Parameters
function (
Callable
) โ The function to decorate.
A decorator that will run the decorated function on the last process only. Can also be called using the PartialState
class.
Example:
Copied
on_local_main_process
( function: Callable[..., Any] = None )
Parameters
function (
Callable
) โ The function to decorate.
A decorator that will run the decorated function on the local main process only. Can also be called using the PartialState
class.
Example:
Copied
on_local_process
( function: Callable[..., Any] = Nonelocal_process_index: int = None )
Parameters
function (
Callable
, optional) โ The function to decorate.local_process_index (
int
, optional) โ The index of the local process on which to run the function.
A decorator that will run the decorated function on a given local process index only. Can also be called using the PartialState
class.
Example:
Copied
on_main_process
( function: Callable[..., Any] = None )
Parameters
function (
Callable
) โ The function to decorate.
A decorator that will run the decorated function on the main process only. Can also be called using the PartialState
class.
Example:
Copied
on_process
( function: Callable[..., Any] = Noneprocess_index: int = None )
Parameters
function (
Callable
,optional
) โ The function to decorate.process_index (
int
,optional
) โ The index of the process on which to run the function.
A decorator that will run the decorated function on a given process index only. Can also be called using the PartialState
class.
Example:
Copied
pad_across_processes
( tensordim = 0pad_index = 0pad_first = False ) โ torch.Tensor
, or a nested tuple/list/dictionary of torch.Tensor
Parameters
tensor (nested list/tuple/dictionary of
torch.Tensor
) โ The data to gather.dim (
int
, optional, defaults to 0) โ The dimension on which to pad.pad_index (
int
, optional, defaults to 0) โ The value with which to pad.pad_first (
bool
, optional, defaults toFalse
) โ Whether to pad at the beginning or the end.
Returns
torch.Tensor
, or a nested tuple/list/dictionary of torch.Tensor
The padded tensor(s).
Recursively pad the tensors in a nested list/tuple/dictionary of tensors from all devices to the same size so they can safely be gathered.
Example:
Copied
prepare
( *argsdevice_placement = None )
Parameters
*args (list of objects) โ Any of the following type of objects:
torch.utils.data.DataLoader
: PyTorch Dataloadertorch.nn.Module
: PyTorch Moduletorch.optim.Optimizer
: PyTorch Optimizertorch.optim.lr_scheduler.LRScheduler
: PyTorch LR Scheduler
device_placement (
list[bool]
, optional) โ Used to customize whether automatic device placement should be performed for each object passed. Needs to be a list of the same length asargs
. Not compatible with DeepSpeed or FSDP.
Prepare all objects passed in args
for distributed training and mixed precision, then return them in the same order.
You donโt need to prepare a model if you only use it for inference without any kind of mixed precision
Examples:
Copied
Copied
prepare_data_loader
( data_loader: torch.utils.data.DataLoaderdevice_placement = Noneslice_fn_for_dispatch = None )
Parameters
data_loader (
torch.utils.data.DataLoader
) โ A vanilla PyTorch DataLoader to preparedevice_placement (
bool
, optional) โ Whether or not to place the batches on the proper device in the prepared dataloader. Will default toself.device_placement
.slice_fn_for_dispatch (
Callable
, optional) -- If passed, this function will be used to slice tensors across
num_processes. Will default to
slice_tensors(). This argument is used only when
dispatch_batchesis set to
True` and will be ignored otherwise.
Prepares a PyTorch DataLoader for training in any distributed setup. It is recommended to use Accelerator.prepare() instead.
Example:
Copied
prepare_model
( model: torch.nn.Moduledevice_placement: bool = Noneevaluation_mode: bool = False )
Parameters
model (
torch.nn.Module
) โ A PyTorch model to prepare. You donโt need to prepare a model if it is used only for inference without any kind of mixed precisiondevice_placement (
bool
, optional) โ Whether or not to place the model on the proper device. Will default toself.device_placement
.evaluation_mode (
bool
, optional, defaults toFalse
) โ Whether or not to set the model for evaluation only, by just applying mixed precision andtorch.compile
(if configured in theAccelerator
object).
Prepares a PyTorch model for training in any distributed setup. It is recommended to use Accelerator.prepare() instead.
Example:
Copied
prepare_optimizer
( optimizer: torch.optim.Optimizerdevice_placement = None )
Parameters
optimizer (
torch.optim.Optimizer
) โ A vanilla PyTorch optimizer to preparedevice_placement (
bool
, optional) โ Whether or not to place the optimizer on the proper device. Will default toself.device_placement
.
Prepares a PyTorch Optimizer for training in any distributed setup. It is recommended to use Accelerator.prepare() instead.
Example:
Copied
prepare_scheduler
( scheduler: LRScheduler )
Parameters
scheduler (
torch.optim.lr_scheduler.LRScheduler
) โ A vanilla PyTorch scheduler to prepare
Prepares a PyTorch Scheduler for training in any distributed setup. It is recommended to use Accelerator.prepare() instead.
Example:
Copied
( *args**kwargs )
Drop in replacement of print()
to only print once per server.
Example:
Copied
reduce
( tensorreduction = 'sum'scale = 1.0 ) โ torch.Tensor
, or a nested tuple/list/dictionary of torch.Tensor
Parameters
tensor (
torch.Tensor
, or a nested tuple/list/dictionary oftorch.Tensor
) โ The tensors to reduce across all processes.reduction (
str
, optional, defaults to โsumโ) โ A reduction type, can be one of โsumโ, โmeanโ, or โnoneโ. If โnoneโ, will not perform any operation.scale (
float
, optional, defaults to 1.0) โ A default scaling value to be applied after the reduce, only valied on XLA.
Returns
torch.Tensor
, or a nested tuple/list/dictionary of torch.Tensor
The reduced tensor(s).
Reduce the values in tensor across all processes based on reduction.
Note: All processes get the reduced value.
Example:
Copied
register_for_checkpointing
( *objects )
Makes note of objects
and will save or load them in during save_state
or load_state
.
These should be utilized when the state is being loaded or saved in the same script. It is not designed to be used in different scripts.
Every object
must have a load_state_dict
and state_dict
function to be stored.
Example:
Copied
register_load_state_pre_hook
( hook: Callable[(Ellipsis, None)] ) โ torch.utils.hooks.RemovableHandle
Parameters
hook (
Callable
) โ A function to be called in Accelerator.load_state() beforeload_checkpoint
.
Returns
torch.utils.hooks.RemovableHandle
a handle that can be used to remove the added hook by calling handle.remove()
Registers a pre hook to be run before load_checkpoint
is called in Accelerator.load_state().
The hook should have the following signature:
hook(models: list[torch.nn.Module], input_dir: str) -> None
The models
argument are the models as saved in the accelerator state under accelerator._models
, and the input_dir
argument is the input_dir
argument passed to Accelerator.load_state().
Should only be used in conjunction with Accelerator.register_save_state_pre_hook(). Can be useful to load configurations in addition to model weights. Can also be used to overwrite model loading with a customized method. In this case, make sure to remove already loaded models from the models list.
register_save_state_pre_hook
( hook: Callable[(Ellipsis, None)] ) โ torch.utils.hooks.RemovableHandle
Parameters
hook (
Callable
) โ A function to be called in Accelerator.save_state() beforesave_checkpoint
.
Returns
torch.utils.hooks.RemovableHandle
a handle that can be used to remove the added hook by calling handle.remove()
Registers a pre hook to be run before save_checkpoint
is called in Accelerator.save_state().
The hook should have the following signature:
hook(models: list[torch.nn.Module], weights: list[dict[str, torch.Tensor]], input_dir: str) -> None
The models
argument are the models as saved in the accelerator state under accelerator._models
, weigths
argument are the state dicts of the models
, and the input_dir
argument is the input_dir
argument passed to Accelerator.load_state().
Should only be used in conjunction with Accelerator.register_load_state_pre_hook(). Can be useful to save configurations in addition to model weights. Can also be used to overwrite model saving with a customized method. In this case, make sure to remove already loaded weights from the weights list.
save
( objfsafe_serialization = False )
Parameters
obj (
object
) โ The object to save.f (
str
oros.PathLike
) โ Where to save the content ofobj
.safe_serialization (
bool
, optional, defaults toFalse
) โ Whether to saveobj
usingsafetensors
Save the object passed to disk once per machine. Use in place of torch.save
.
Note: If save_on_each_node
was passed in as a ProjectConfiguration
, will save the object once per node, rather than only once on the main node.
Example:
Copied
save_model
( model: torch.nn.Modulesave_directory: Union[str, os.PathLike]max_shard_size: Union[int, str] = '10GB'safe_serialization: bool = False )
Parameters
save_directory (
str
oros.PathLike
) โ Directory to which to save. Will be created if it doesnโt exist.max_shard_size (
int
orstr
, optional, defaults to"10GB"
) โ The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"
).If a single weight of the model is bigger than
max_shard_size
, it will be in its own checkpoint shard which will be bigger thanmax_shard_size
.safe_serialization (
bool
, optional, defaults toFalse
) โ Whether to save the model usingsafetensors
or the traditional PyTorch way (that usespickle
).
Save a model so that it can be re-loaded using load_checkpoint_in_model
Example:
Copied
save_state
( output_dir: str = None**save_model_func_kwargs )
Parameters
output_dir (
str
oros.PathLike
) โ The name of the folder to save all relevant weights and states.save_model_func_kwargs (
dict
, optional) โ Additional keyword arguments for saving model which can be passed to the underlying save function, such as optional arguments for DeepSpeedโssave_checkpoint
function.
Saves the current states of the model, optimizer, scaler, RNG generators, and registered objects to a folder.
If a ProjectConfiguration
was passed to the Accelerator
object with automatic_checkpoint_naming
enabled then checkpoints will be saved to self.project_dir/checkpoints
. If the number of current saves is greater than total_limit
then the oldest save is deleted. Each checkpoint is saved in seperate folders named checkpoint_<iteration>
.
Otherwise they are just saved to output_dir
.
Should only be used when wanting to save a checkpoint during training and restoring the state in the same environment.
Example:
Copied
set_trigger
( )
Sets the internal trigger tensor to 1 on the current process. A latter check should follow using this which will check across all processes.
Note: Does not require wait_for_everyone()
Example:
Copied
skip_first_batches
( dataloadernum_batches: int = 0 )
Parameters
dataloader (
torch.utils.data.DataLoader
) โ The data loader in which to skip batches.num_batches (
int
, optional, defaults to 0) โ The number of batches to skip
Creates a new torch.utils.data.DataLoader
that will efficiently skip the first num_batches
.
Example:
Copied
split_between_processes
( inputs: list | tuple | dict | torch.Tensorapply_padding: bool = False )
Parameters
inputs (
list
,tuple
,torch.Tensor
, ordict
oflist
/tuple
/torch.Tensor
) โ The input to split between processes.apply_padding (
bool
,optional
, defaults toFalse
) โ Whether to apply padding by repeating the last element of the input so that all processes have the same number of elements. Useful when trying to perform actions such asAccelerator.gather()
on the outputs or passing in less inputs than there are processes. If so, just remember to drop the padded elements afterwards.
Splits input
between self.num_processes
quickly and can be then used on that process. Useful when doing distributed inference, such as with different prompts.
Note that when using a dict
, all keys need to have the same number of elements.
Example:
Copied
trigger_sync_in_backward
( model )
Parameters
model (
torch.nn.Module
) โ The model for which to trigger the gradient synchronization.
Trigger the sync of the gradients in the next backward pass of the model after multiple forward passes under Accelerator.no_sync
(only applicable in multi-GPU scenarios).
If the script is not launched in distributed mode, this context manager does nothing.
Example:
Copied
unscale_gradients
( optimizer = None )
Parameters
optimizer (
torch.optim.Optimizer
orlist[torch.optim.Optimizer]
, optional) โ The optimizer(s) for which to unscale gradients. If not set, will unscale gradients on all optimizers that were passed to prepare().
Unscale the gradients in mixed precision training with AMP. This is a noop in all other settings.
Likely should be called through Accelerator.clipgrad_norm() or Accelerator.clipgrad_value()
Example:
Copied
unwrap_model
( modelkeep_fp32_wrapper: bool = True ) โ torch.nn.Module
Parameters
model (
torch.nn.Module
) โ The model to unwrap.keep_fp32_wrapper (
bool
, optional, defaults toTrue
) โ Whether to not remove the mixed precision hook if it was added.
Returns
torch.nn.Module
The unwrapped model.
Unwraps the model
from the additional layer possible added by prepare(). Useful before saving the model.
Example:
Copied
verify_device_map
( model: torch.nn.Module )
Verifies that model
has not been prepared with big model inference with a device-map resembling auto
.
wait_for_everyone
( )
Will stop the execution of the current process until every other process has reached that point (so this does nothing when the script is only run in one process). Useful to do before saving a model.
Example:
Copied
Last updated