Accelerate
  • ๐ŸŒGETTING STARTED
    • BOINC AI Accelerate
    • Installation
    • Quicktour
  • ๐ŸŒTUTORIALS
    • Overview
    • Migrating to BOINC AI Accelerate
    • Launching distributed code
    • Launching distributed training from Jupyter Notebooks
  • ๐ŸŒHOW-TO GUIDES
    • Start Here!
    • Example Zoo
    • How to perform inference on large models with small resources
    • Knowing how big of a model you can fit into memory
    • How to quantize model
    • How to perform distributed inference with normal resources
    • Performing gradient accumulation
    • Accelerating training with local SGD
    • Saving and loading training states
    • Using experiment trackers
    • Debugging timeout errors
    • How to avoid CUDA Out-of-Memory
    • How to use Apple Silicon M1 GPUs
    • How to use DeepSpeed
    • How to use Fully Sharded Data Parallelism
    • How to use Megatron-LM
    • How to use BOINC AI Accelerate with SageMaker
    • How to use BOINC AI Accelerate with Intelยฎ Extension for PyTorch for cpu
  • ๐ŸŒCONCEPTS AND FUNDAMENTALS
    • BOINC AI Accelerate's internal mechanism
    • Loading big models into memory
    • Comparing performance across distributed setups
    • Executing and deferring jobs
    • Gradient synchronization
    • TPU best practices
  • ๐ŸŒREFERENCE
    • Main Accelerator class
    • Stateful configuration classes
    • The Command Line
    • Torch wrapper classes
    • Experiment trackers
    • Distributed launchers
    • DeepSpeed utilities
    • Logging
    • Working with large models
    • Kwargs handlers
    • Utility functions and classes
    • Megatron-LM Utilities
    • Fully Sharded Data Parallelism Utilities
Powered by GitBook
On this page
  1. REFERENCE

Fully Sharded Data Parallelism Utilities

PreviousMegatron-LM Utilities

Last updated 1 year ago

Utilities for Fully Sharded Data Parallelism

class accelerate.FullyShardedDataParallelPlugin

( sharding_strategy: typing.Any = Nonebackward_prefetch: typing.Any = Nonemixed_precision_policy: typing.Any = Noneauto_wrap_policy: typing.Optional[typing.Callable] = Nonecpu_offload: typing.Any = Noneignored_modules: typing.Optional[typing.Iterable[torch.nn.modules.module.Module]] = Nonestate_dict_type: typing.Any = Nonestate_dict_config: typing.Any = Noneoptim_state_dict_config: typing.Any = Nonelimit_all_gathers: bool = Falseuse_orig_params: bool = Falseparam_init_fn: typing.Optional[typing.Callable[[torch.nn.modules.module.Module]], NoneType] = Nonesync_module_states: bool = Trueforward_prefetch: bool = Falseactivation_checkpointing: bool = False )

This plugin is used to enable fully sharded data parallelism.

get_module_class_from_name

( modulename )

Parameters

  • module (torch.nn.Module) โ€” The module to get the class from.

  • name (str) โ€” The name of the class.

Gets a class from a module by its name.

๐ŸŒ
<source>
<source>