Accelerate
  • ๐ŸŒGETTING STARTED
    • BOINC AI Accelerate
    • Installation
    • Quicktour
  • ๐ŸŒTUTORIALS
    • Overview
    • Migrating to BOINC AI Accelerate
    • Launching distributed code
    • Launching distributed training from Jupyter Notebooks
  • ๐ŸŒHOW-TO GUIDES
    • Start Here!
    • Example Zoo
    • How to perform inference on large models with small resources
    • Knowing how big of a model you can fit into memory
    • How to quantize model
    • How to perform distributed inference with normal resources
    • Performing gradient accumulation
    • Accelerating training with local SGD
    • Saving and loading training states
    • Using experiment trackers
    • Debugging timeout errors
    • How to avoid CUDA Out-of-Memory
    • How to use Apple Silicon M1 GPUs
    • How to use DeepSpeed
    • How to use Fully Sharded Data Parallelism
    • How to use Megatron-LM
    • How to use BOINC AI Accelerate with SageMaker
    • How to use BOINC AI Accelerate with Intelยฎ Extension for PyTorch for cpu
  • ๐ŸŒCONCEPTS AND FUNDAMENTALS
    • BOINC AI Accelerate's internal mechanism
    • Loading big models into memory
    • Comparing performance across distributed setups
    • Executing and deferring jobs
    • Gradient synchronization
    • TPU best practices
  • ๐ŸŒREFERENCE
    • Main Accelerator class
    • Stateful configuration classes
    • The Command Line
    • Torch wrapper classes
    • Experiment trackers
    • Distributed launchers
    • DeepSpeed utilities
    • Logging
    • Working with large models
    • Kwargs handlers
    • Utility functions and classes
    • Megatron-LM Utilities
    • Fully Sharded Data Parallelism Utilities
Powered by GitBook
On this page
  • Kwargs Handlers
  • AutocastKwargs
  • DistributedDataParallelKwargs
  • FP8RecipeKwargs
  • GradScalerKwargs
  • InitProcessGroupKwargs
  1. REFERENCE

Kwargs handlers

PreviousWorking with large modelsNextUtility functions and classes

Last updated 1 year ago

Kwargs Handlers

The following objects can be passed to the main to customize how some PyTorch objects related to distributed training or mixed precision are created.

AutocastKwargs

class accelerate.AutocastKwargs

( enabled: bool = Truecache_enabled: bool = None )

Use this object in your to customize how torch.autocast behaves. Please refer to the documentation of this for more information on each argument.

Example:

Copied

from accelerate import Accelerator
from accelerate.utils import AutocastKwargs

kwargs = AutocastKwargs(cache_enabled=True)
accelerator = Accelerator(kwargs_handlers=[kwargs])

DistributedDataParallelKwargs

class accelerate.DistributedDataParallelKwargs

( dim: int = 0broadcast_buffers: bool = Truebucket_cap_mb: int = 25find_unused_parameters: bool = Falsecheck_reduction: bool = Falsegradient_as_bucket_view: bool = Falsestatic_graph: bool = False )

gradient_as_bucket_view is only available in PyTorch 1.7.0 and later versions.

static_graph is only available in PyTorch 1.11.0 and later versions.

Example:

Copied

from accelerate import Accelerator
from accelerate.utils import DistributedDataParallelKwargs

kwargs = DistributedDataParallelKwargs(find_unused_parameters=True)
accelerator = Accelerator(kwargs_handlers=[kwargs])

FP8RecipeKwargs

class accelerate.utils.FP8RecipeKwargs

( margin: int = 0interval: int = 1fp8_format: str = 'E4M3'amax_history_len: int = 1amax_compute_algo: str = 'most_recent'override_linear_precision: typing.Tuple[bool, bool, bool] = (False, False, False) )

Copied

from accelerate import Accelerator
from accelerate.utils import FP8RecipeKwargs

kwargs = FP8RecipeKwargs(fp8_format="HYBRID")
accelerator = Accelerator(mixed_precision="fp8", kwargs_handlers=[kwargs])

GradScalerKwargs

class accelerate.GradScalerKwargs

( init_scale: float = 65536.0growth_factor: float = 2.0backoff_factor: float = 0.5growth_interval: int = 2000enabled: bool = True )

GradScaler is only available in PyTorch 1.5.0 and later versions.

Example:

Copied

from accelerate import Accelerator
from accelerate.utils import GradScalerKwargs

kwargs = GradScalerKwargs(backoff_filter=0.25)
accelerator = Accelerator(kwargs_handlers=[kwargs])

InitProcessGroupKwargs

class accelerate.InitProcessGroupKwargs

( backend: typing.Optional[str] = 'nccl'init_method: typing.Optional[str] = Nonetimeout: timedelta = datetime.timedelta(seconds=1800) )

Copied

from datetime import timedelta
from accelerate import Accelerator
from accelerate.utils import InitProcessGroupKwargs

kwargs = InitProcessGroupKwargs(timeout=timedelta(seconds=800))
accelerator = Accelerator(kwargs_handlers=[kwargs])

Use this object in your to customize how your model is wrapped in a torch.nn.parallel.DistributedDataParallel. Please refer to the documentation of this for more information on each argument.

Use this object in your to customize the initialization of the recipe for FP8 mixed precision training. Please refer to the documentation of this for more information on each argument.

Use this object in your to customize the behavior of mixed precision, specifically how the torch.cuda.amp.GradScaler used is created. Please refer to the documentation of this for more information on each argument.

Use this object in your to customize the initialization of the distributed processes. Please refer to the documentation of this for more information on each argument.

๐ŸŒ
Accelerator
<source>
Accelerator
context manager
<source>
Accelerator
wrapper
<source>
Accelerator
class
<source>
Accelerator
scaler
<source>
Accelerator
method