Accelerate
  • ๐ŸŒGETTING STARTED
    • BOINC AI Accelerate
    • Installation
    • Quicktour
  • ๐ŸŒTUTORIALS
    • Overview
    • Migrating to BOINC AI Accelerate
    • Launching distributed code
    • Launching distributed training from Jupyter Notebooks
  • ๐ŸŒHOW-TO GUIDES
    • Start Here!
    • Example Zoo
    • How to perform inference on large models with small resources
    • Knowing how big of a model you can fit into memory
    • How to quantize model
    • How to perform distributed inference with normal resources
    • Performing gradient accumulation
    • Accelerating training with local SGD
    • Saving and loading training states
    • Using experiment trackers
    • Debugging timeout errors
    • How to avoid CUDA Out-of-Memory
    • How to use Apple Silicon M1 GPUs
    • How to use DeepSpeed
    • How to use Fully Sharded Data Parallelism
    • How to use Megatron-LM
    • How to use BOINC AI Accelerate with SageMaker
    • How to use BOINC AI Accelerate with Intelยฎ Extension for PyTorch for cpu
  • ๐ŸŒCONCEPTS AND FUNDAMENTALS
    • BOINC AI Accelerate's internal mechanism
    • Loading big models into memory
    • Comparing performance across distributed setups
    • Executing and deferring jobs
    • Gradient synchronization
    • TPU best practices
  • ๐ŸŒREFERENCE
    • Main Accelerator class
    • Stateful configuration classes
    • The Command Line
    • Torch wrapper classes
    • Experiment trackers
    • Distributed launchers
    • DeepSpeed utilities
    • Logging
    • Working with large models
    • Kwargs handlers
    • Utility functions and classes
    • Megatron-LM Utilities
    • Fully Sharded Data Parallelism Utilities
Powered by GitBook
On this page
  1. REFERENCE

Stateful configuration classes

PreviousMain Accelerator classNextThe Command Line

Last updated 1 year ago

Stateful Classes

Below are variations of a in the sense that all instances share the same state, which is initialized on the first instantiation.

These classes are immutable and store information about certain configurations or states.

class accelerate.PartialState

( cpu: bool = False**kwargs )

Singleton class that has information about the current training environment and functions to help with process control. Designed to be used when only process control and device execution states are needed. Does not need to be initialized from Accelerator.

Available attributes:

  • device (torch.device) โ€” The device to use.

  • distributed_type () โ€” The type of distributed environment currently in use.

  • local_process_index (int) โ€” The index of the current process on the current server.

  • mixed_precision (str) โ€” Whether or not the current script will use mixed precision, and if so the type of mixed precision being performed. (Choose from โ€˜noโ€™,โ€˜fp16โ€™,โ€˜bf16 or โ€˜fp8โ€™).

  • num_processes (int) โ€” The number of processes currently launched in parallel.

  • process_index (int) โ€” The index of the current process.

  • is_last_process (bool) โ€” Whether or not the current process is the last one.

  • is_main_process (bool) โ€” Whether or not the current process is the main one.

  • is_local_main_process (bool) โ€” Whether or not the current process is the main one on the local node.

  • debug (bool) โ€” Whether or not the current script is being run in debug mode.

local_main_process_first

( )

Lets the local main process go inside a with block.

The other processes will enter the with block after the main process exits.

Example:

Copied

>>> from accelerate.state import PartialState

>>> state = PartialState()
>>> with state.local_main_process_first():
...     # This will be printed first by local process 0 then in a seemingly
...     # random order by the other processes.
...     print(f"This will be printed by process {state.local_process_index}")

main_process_first

( )

Lets the main process go first inside a with block.

The other processes will enter the with block after the main process exits.

Example:

Copied

>>> from accelerate import Accelerator

>>> accelerator = Accelerator()
>>> with accelerator.main_process_first():
...     # This will be printed first by process 0 then in a seemingly
...     # random order by the other processes.
...     print(f"This will be printed by process {accelerator.process_index}")

on_last_process

( function: Callable[..., Any] )

Parameters

  • function (Callable) โ€” The function to decorate.

Decorator that only runs the decorated function on the last process.

Example:

Copied

# Assume we have 4 processes.
from accelerate.state import PartialState

state = PartialState()


@state.on_last_process
def print_something():
    print(f"Printed on process {state.process_index}")


print_something()
"Printed on process 3"

on_local_main_process

( function: Callable[..., Any] = None )

Parameters

  • function (Callable) โ€” The function to decorate.

Decorator that only runs the decorated function on the local main process.

Example:

Copied

# Assume we have 2 servers with 4 processes each.
from accelerate.state import PartialState

state = PartialState()


@state.on_local_main_process
def print_something():
    print("This will be printed by process 0 only on each server.")


print_something()
# On server 1:
"This will be printed by process 0 only"
# On server 2:
"This will be printed by process 0 only"

on_local_process

( function: Callable[..., Any] = Nonelocal_process_index: int = None )

Parameters

  • function (Callable, optional) โ€” The function to decorate.

  • local_process_index (int, optional) โ€” The index of the local process on which to run the function.

Decorator that only runs the decorated function on the process with the given index on the current node.

Example:

Copied

# Assume we have 2 servers with 4 processes each.
from accelerate import Accelerator

accelerator = Accelerator()


@accelerator.on_local_process(local_process_index=2)
def print_something():
    print(f"Printed on process {accelerator.local_process_index}")


print_something()
# On server 1:
"Printed on process 2"
# On server 2:
"Printed on process 2"

on_main_process

( function: Callable[..., Any] = None )

Parameters

  • function (Callable) โ€” The function to decorate.

Decorator that only runs the decorated function on the main process.

Example:

Copied

>>> from accelerate.state import PartialState

>>> state = PartialState()


>>> @state.on_main_process
... def print_something():
...     print("This will be printed by process 0 only.")


>>> print_something()
"This will be printed by process 0 only"

on_process

( function: Callable[..., Any] = Noneprocess_index: int = None )

Parameters

  • function (Callable, optional) โ€” The function to decorate.

  • process_index (int, optional) โ€” The index of the process on which to run the function.

Decorator that only runs the decorated function on the process with the given index.

Example:

Copied

# Assume we have 4 processes.
from accelerate.state import PartialState

state = PartialState()


@state.on_process(process_index=2)
def print_something():
    print(f"Printed on process {state.process_index}")


print_something()
"Printed on process 2"

split_between_processes

( inputs: list | tuple | dict | torch.Tensorapply_padding: bool = False )

Parameters

  • inputs (list, tuple, torch.Tensor, or dict of list/tuple/torch.Tensor) โ€” The input to split between processes.

  • apply_padding (bool, optional, defaults to False) โ€” Whether to apply padding by repeating the last element of the input so that all processes have the same number of elements. Useful when trying to perform actions such as gather() on the outputs or passing in less inputs than there are processes. If so, just remember to drop the padded elements afterwards.

Splits input between self.num_processes quickly and can be then used on that process. Useful when doing distributed inference, such as with different prompts.

Note that when using a dict, all keys need to have the same number of elements.

Example:

Copied

# Assume there are two processes
from accelerate import PartialState

state = PartialState()
with state.split_between_processes(["A", "B", "C"]) as inputs:
    print(inputs)
# Process 0
["A", "B"]
# Process 1
["C"]

with state.split_between_processes(["A", "B", "C"], apply_padding=True) as inputs:
    print(inputs)
# Process 0
["A", "B"]
# Process 1
["C", "C"]

wait_for_everyone

( )

Will stop the execution of the current process until every other process has reached that point (so this does nothing when the script is only run in one process). Useful to do before saving a model.

Example:

Copied

>>> # Assuming two GPU processes
>>> import time
>>> from accelerate.state import PartialState

>>> state = PartialState()
>>> if state.is_main_process:
...     time.sleep(2)
>>> else:
...     print("I'm waiting for the main process to finish its sleep...")
>>> state.wait_for_everyone()
>>> # Should print on every process at the same time
>>> print("Everyone is here")

class accelerate.state.AcceleratorState

( mixed_precision: str = Nonecpu: bool = Falsedynamo_plugin = Nonedeepspeed_plugin = Nonefsdp_plugin = Nonemegatron_lm_plugin = None_from_accelerator: bool = False**kwargs )

Singleton class that has information about the current training environment.

Available attributes:

  • device (torch.device) โ€” The device to use.

  • initialized (bool) โ€” Whether or not the AcceleratorState has been initialized from Accelerator.

  • local_process_index (int) โ€” The index of the current process on the current server.

  • mixed_precision (str) โ€” Whether or not the current script will use mixed precision, and if so the type of mixed precision being performed. (Choose from โ€˜noโ€™,โ€˜fp16โ€™,โ€˜bf16 or โ€˜fp8โ€™).

  • num_processes (int) โ€” The number of processes currently launched in parallel.

  • process_index (int) โ€” The index of the current process.

  • is_last_process (bool) โ€” Whether or not the current process is the last one.

  • is_main_process (bool) โ€” Whether or not the current process is the main one.

  • is_local_main_process (bool) โ€” Whether or not the current process is the main one on the local node.

  • debug (bool) โ€” Whether or not the current script is being run in debug mode.

local_main_process_first

( )

Lets the local main process go inside a with block.

The other processes will enter the with block after the main process exits.

main_process_first

( )

Lets the main process go first inside a with block.

The other processes will enter the with block after the main process exits.

split_between_processes

( inputs: list | tuple | dict | torch.Tensorapply_padding: bool = False )

Parameters

  • inputs (list, tuple, torch.Tensor, or dict of list/tuple/torch.Tensor) โ€” The input to split between processes.

  • apply_padding (bool, optional, defaults to False) โ€” Whether to apply padding by repeating the last element of the input so that all processes have the same number of elements. Useful when trying to perform actions such as gather() on the outputs or passing in less inputs than there are processes. If so, just remember to drop the padded elements afterwards.

Splits input between self.num_processes quickly and can be then used on that process. Useful when doing distributed inference, such as with different prompts.

Note that when using a dict, all keys need to have the same number of elements.

Example:

Copied

# Assume there are two processes
from accelerate.state import AcceleratorState

state = AcceleratorState()
with state.split_between_processes(["A", "B", "C"]) as inputs:
    print(inputs)
# Process 0
["A", "B"]
# Process 1
["C"]

with state.split_between_processes(["A", "B", "C"], apply_padding=True) as inputs:
    print(inputs)
# Process 0
["A", "B"]
# Process 1
["C", "C"]

class accelerate.state.GradientState

( gradient_accumulation_plugin: Optional[GradientAccumulationPlugin] = None )

Singleton class that has information related to gradient synchronization for gradient accumulation

Available attributes:

  • end_of_dataloader (bool) โ€” Whether we have reached the end the current dataloader

  • remainder (int) โ€” The number of extra samples that were added from padding the dataloader

  • sync_gradients (bool) โ€” Whether the gradients should be synced across all devices

  • active_dataloader (Optional[DataLoader]) โ€” The dataloader that is currently being iterated over

  • dataloader_references (List[Optional[DataLoader]]) โ€” A list of references to the dataloaders that are being iterated over

  • num_steps (int) โ€” The number of steps to accumulate over

  • adjust_scheduler (bool) โ€” Whether the scheduler should be adjusted to account for the gradient accumulation

  • sync_with_dataloader (bool) โ€” Whether the gradients should be synced at the end of the dataloader iteration and the number of total steps reset

distributed_type () โ€” The type of distributed environment currently in use.

๐ŸŒ
singleton class
<source>
DistributedType
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>
DistributedType
<source>
<source>
<source>
<source>