Accelerate
  • ๐ŸŒGETTING STARTED
    • BOINC AI Accelerate
    • Installation
    • Quicktour
  • ๐ŸŒTUTORIALS
    • Overview
    • Migrating to BOINC AI Accelerate
    • Launching distributed code
    • Launching distributed training from Jupyter Notebooks
  • ๐ŸŒHOW-TO GUIDES
    • Start Here!
    • Example Zoo
    • How to perform inference on large models with small resources
    • Knowing how big of a model you can fit into memory
    • How to quantize model
    • How to perform distributed inference with normal resources
    • Performing gradient accumulation
    • Accelerating training with local SGD
    • Saving and loading training states
    • Using experiment trackers
    • Debugging timeout errors
    • How to avoid CUDA Out-of-Memory
    • How to use Apple Silicon M1 GPUs
    • How to use DeepSpeed
    • How to use Fully Sharded Data Parallelism
    • How to use Megatron-LM
    • How to use BOINC AI Accelerate with SageMaker
    • How to use BOINC AI Accelerate with Intelยฎ Extension for PyTorch for cpu
  • ๐ŸŒCONCEPTS AND FUNDAMENTALS
    • BOINC AI Accelerate's internal mechanism
    • Loading big models into memory
    • Comparing performance across distributed setups
    • Executing and deferring jobs
    • Gradient synchronization
    • TPU best practices
  • ๐ŸŒREFERENCE
    • Main Accelerator class
    • Stateful configuration classes
    • The Command Line
    • Torch wrapper classes
    • Experiment trackers
    • Distributed launchers
    • DeepSpeed utilities
    • Logging
    • Working with large models
    • Kwargs handlers
    • Utility functions and classes
    • Megatron-LM Utilities
    • Fully Sharded Data Parallelism Utilities
Powered by GitBook
On this page
  1. REFERENCE

Megatron-LM Utilities

PreviousUtility functions and classesNextFully Sharded Data Parallelism Utilities

Last updated 1 year ago

Utilities for Megatron-LM

class accelerate.utils.MegatronLMPlugin

( tp_degree: int = Nonepp_degree: int = Nonenum_micro_batches: int = Nonegradient_clipping: float = Nonesequence_parallelism: bool = Nonerecompute_activation: bool = Noneuse_distributed_optimizer: bool = Nonepipeline_model_parallel_split_rank: int = Nonenum_layers_per_virtual_pipeline_stage: int = Noneis_train_batch_min: str = Truetrain_iters: int = Nonetrain_samples: int = Noneweight_decay_incr_style: str = 'constant'start_weight_decay: float = Noneend_weight_decay: float = Nonelr_decay_style: str = 'linear'lr_decay_iters: int = Nonelr_decay_samples: int = Nonelr_warmup_iters: int = Nonelr_warmup_samples: int = Nonelr_warmup_fraction: float = Nonemin_lr: float = 0consumed_samples: typing.List[int] = Noneno_wd_decay_cond: typing.Optional[typing.Callable] = Nonescale_lr_cond: typing.Optional[typing.Callable] = Nonelr_mult: float = 1.0megatron_dataset_flag: bool = Falseseq_length: int = Noneencoder_seq_length: int = Nonedecoder_seq_length: int = Nonetensorboard_dir: str = Noneset_all_logging_options: bool = Falseeval_iters: int = 100eval_interval: int = 1000return_logits: bool = Falsecustom_train_step_class: typing.Optional[typing.Any] = Nonecustom_train_step_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = Nonecustom_model_provider_function: typing.Optional[typing.Callable] = Nonecustom_prepare_model_function: typing.Optional[typing.Callable] = Noneother_megatron_args: typing.Union[typing.Dict[str, typing.Any], NoneType] = None )

Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.

class accelerate.utils.MegatronLMDummyScheduler

( optimizertotal_num_steps = Nonewarmup_num_steps = 0**kwargs )

Parameters

  • optimizer (torch.optim.optimizer.Optimizer) โ€” The optimizer to wrap.

  • total_num_steps (int) โ€” Total number of steps.

  • warmup_num_steps (int) โ€” Number of steps for warmup. **kwargs โ€” Other arguments.

Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.

class accelerate.utils.MegatronLMDummyDataLoader

( **dataset_kwargs )

Dummy dataloader presents model parameters or param groups, this is primarily used to follow conventional training

class accelerate.utils.AbstractTrainStep

( name )

Abstract class for batching, forward pass and loss handler.

class accelerate.utils.GPTTrainStep

( args )

Parameters

  • args (argparse.Namespace) โ€” Megatron-LM arguments.

GPT train step class.

class accelerate.utils.BertTrainStep

( args )

Parameters

  • args (argparse.Namespace) โ€” Megatron-LM arguments.

Bert train step class.

class accelerate.utils.T5TrainStep

( args )

Parameters

  • args (argparse.Namespace) โ€” Megatron-LM arguments.

T5 train step class.

accelerate.utils.avg_losses_across_data_parallel_group

( losses )

Parameters

  • losses (List[Tensor]) โ€” List of losses to average across data parallel group.

Average losses across data parallel group.

๐ŸŒ
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>