Megatron-LM Utilities

Utilities for Megatron-LM

class accelerate.utils.MegatronLMPlugin

<source>

( tp_degree: int = Nonepp_degree: int = Nonenum_micro_batches: int = Nonegradient_clipping: float = Nonesequence_parallelism: bool = Nonerecompute_activation: bool = Noneuse_distributed_optimizer: bool = Nonepipeline_model_parallel_split_rank: int = Nonenum_layers_per_virtual_pipeline_stage: int = Noneis_train_batch_min: str = Truetrain_iters: int = Nonetrain_samples: int = Noneweight_decay_incr_style: str = 'constant'start_weight_decay: float = Noneend_weight_decay: float = Nonelr_decay_style: str = 'linear'lr_decay_iters: int = Nonelr_decay_samples: int = Nonelr_warmup_iters: int = Nonelr_warmup_samples: int = Nonelr_warmup_fraction: float = Nonemin_lr: float = 0consumed_samples: typing.List[int] = Noneno_wd_decay_cond: typing.Optional[typing.Callable] = Nonescale_lr_cond: typing.Optional[typing.Callable] = Nonelr_mult: float = 1.0megatron_dataset_flag: bool = Falseseq_length: int = Noneencoder_seq_length: int = Nonedecoder_seq_length: int = Nonetensorboard_dir: str = Noneset_all_logging_options: bool = Falseeval_iters: int = 100eval_interval: int = 1000return_logits: bool = Falsecustom_train_step_class: typing.Optional[typing.Any] = Nonecustom_train_step_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = Nonecustom_model_provider_function: typing.Optional[typing.Callable] = Nonecustom_prepare_model_function: typing.Optional[typing.Callable] = Noneother_megatron_args: typing.Union[typing.Dict[str, typing.Any], NoneType] = None )

Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.

class accelerate.utils.MegatronLMDummyScheduler

<source>

( optimizertotal_num_steps = Nonewarmup_num_steps = 0**kwargs )

Parameters

  • optimizer (torch.optim.optimizer.Optimizer) โ€” The optimizer to wrap.

  • total_num_steps (int) โ€” Total number of steps.

  • warmup_num_steps (int) โ€” Number of steps for warmup. **kwargs โ€” Other arguments.

Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.

class accelerate.utils.MegatronLMDummyDataLoader

<source>

( **dataset_kwargs )

Dummy dataloader presents model parameters or param groups, this is primarily used to follow conventional training

class accelerate.utils.AbstractTrainStep

<source>

( name )

Abstract class for batching, forward pass and loss handler.

class accelerate.utils.GPTTrainStep

<source>

( args )

Parameters

  • args (argparse.Namespace) โ€” Megatron-LM arguments.

GPT train step class.

class accelerate.utils.BertTrainStep

<source>

( args )

Parameters

  • args (argparse.Namespace) โ€” Megatron-LM arguments.

Bert train step class.

class accelerate.utils.T5TrainStep

<source>

( args )

Parameters

  • args (argparse.Namespace) โ€” Megatron-LM arguments.

T5 train step class.

accelerate.utils.avg_losses_across_data_parallel_group

<source>

( losses )

Parameters

  • losses (List[Tensor]) โ€” List of losses to average across data parallel group.

Average losses across data parallel group.

Last updated