# Kwargs handlers

## Kwargs Handlers

The following objects can be passed to the main [Accelerator](https://huggingface.co/docs/accelerate/v0.24.0/en/package_reference/accelerator#accelerate.Accelerator) to customize how some PyTorch objects related to distributed training or mixed precision are created.

### AutocastKwargs

#### class accelerate.AutocastKwargs

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/dataclasses.py#L61)

( enabled: bool = Truecache\_enabled: bool = None )

Use this object in your [Accelerator](https://huggingface.co/docs/accelerate/v0.24.0/en/package_reference/accelerator#accelerate.Accelerator) to customize how `torch.autocast` behaves. Please refer to the documentation of this [context manager](https://pytorch.org/docs/stable/amp.html#torch.autocast) for more information on each argument.

Example:

Copied

```
from accelerate import Accelerator
from accelerate.utils import AutocastKwargs

kwargs = AutocastKwargs(cache_enabled=True)
accelerator = Accelerator(kwargs_handlers=[kwargs])
```

### DistributedDataParallelKwargs

#### class accelerate.DistributedDataParallelKwargs

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/dataclasses.py#L83)

( dim: int = 0broadcast\_buffers: bool = Truebucket\_cap\_mb: int = 25find\_unused\_parameters: bool = Falsecheck\_reduction: bool = Falsegradient\_as\_bucket\_view: bool = Falsestatic\_graph: bool = False )

Use this object in your [Accelerator](https://huggingface.co/docs/accelerate/v0.24.0/en/package_reference/accelerator#accelerate.Accelerator) to customize how your model is wrapped in a `torch.nn.parallel.DistributedDataParallel`. Please refer to the documentation of this [wrapper](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html) for more information on each argument.

`gradient_as_bucket_view` is only available in PyTorch 1.7.0 and later versions.

`static_graph` is only available in PyTorch 1.11.0 and later versions.

Example:

Copied

```
from accelerate import Accelerator
from accelerate.utils import DistributedDataParallelKwargs

kwargs = DistributedDataParallelKwargs(find_unused_parameters=True)
accelerator = Accelerator(kwargs_handlers=[kwargs])
```

### FP8RecipeKwargs

#### class accelerate.utils.FP8RecipeKwargs

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/dataclasses.py#L173)

( margin: int = 0interval: int = 1fp8\_format: str = 'E4M3'amax\_history\_len: int = 1amax\_compute\_algo: str = 'most\_recent'override\_linear\_precision: typing.Tuple\[bool, bool, bool] = (False, False, False) )

Use this object in your [Accelerator](https://huggingface.co/docs/accelerate/v0.24.0/en/package_reference/accelerator#accelerate.Accelerator) to customize the initialization of the recipe for FP8 mixed precision training. Please refer to the documentation of this [class](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/common.html#transformer_engine.common.recipe.DelayedScaling) for more information on each argument.

Copied

```
from accelerate import Accelerator
from accelerate.utils import FP8RecipeKwargs

kwargs = FP8RecipeKwargs(fp8_format="HYBRID")
accelerator = Accelerator(mixed_precision="fp8", kwargs_handlers=[kwargs])
```

### GradScalerKwargs

#### class accelerate.GradScalerKwargs

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/dataclasses.py#L119)

( init\_scale: float = 65536.0growth\_factor: float = 2.0backoff\_factor: float = 0.5growth\_interval: int = 2000enabled: bool = True )

Use this object in your [Accelerator](https://huggingface.co/docs/accelerate/v0.24.0/en/package_reference/accelerator#accelerate.Accelerator) to customize the behavior of mixed precision, specifically how the `torch.cuda.amp.GradScaler` used is created. Please refer to the documentation of this [scaler](https://pytorch.org/docs/stable/amp.html?highlight=gradscaler) for more information on each argument.

`GradScaler` is only available in PyTorch 1.5.0 and later versions.

Example:

Copied

```
from accelerate import Accelerator
from accelerate.utils import GradScalerKwargs

kwargs = GradScalerKwargs(backoff_filter=0.25)
accelerator = Accelerator(kwargs_handlers=[kwargs])
```

### InitProcessGroupKwargs

#### class accelerate.InitProcessGroupKwargs

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/dataclasses.py#L150)

( backend: typing.Optional\[str] = 'nccl'init\_method: typing.Optional\[str] = Nonetimeout: timedelta = datetime.timedelta(seconds=1800) )

Use this object in your [Accelerator](https://huggingface.co/docs/accelerate/v0.24.0/en/package_reference/accelerator#accelerate.Accelerator) to customize the initialization of the distributed processes. Please refer to the documentation of this [method](https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group) for more information on each argument.

Copied

```
from datetime import timedelta
from accelerate import Accelerator
from accelerate.utils import InitProcessGroupKwargs

kwargs = InitProcessGroupKwargs(timeout=timedelta(seconds=800))
accelerator = Accelerator(kwargs_handlers=[kwargs])
```
