Megatron-LM Utilities
Utilities for Megatron-LM
class accelerate.utils.MegatronLMPlugin
( tp_degree: int = Nonepp_degree: int = Nonenum_micro_batches: int = Nonegradient_clipping: float = Nonesequence_parallelism: bool = Nonerecompute_activation: bool = Noneuse_distributed_optimizer: bool = Nonepipeline_model_parallel_split_rank: int = Nonenum_layers_per_virtual_pipeline_stage: int = Noneis_train_batch_min: str = Truetrain_iters: int = Nonetrain_samples: int = Noneweight_decay_incr_style: str = 'constant'start_weight_decay: float = Noneend_weight_decay: float = Nonelr_decay_style: str = 'linear'lr_decay_iters: int = Nonelr_decay_samples: int = Nonelr_warmup_iters: int = Nonelr_warmup_samples: int = Nonelr_warmup_fraction: float = Nonemin_lr: float = 0consumed_samples: typing.List[int] = Noneno_wd_decay_cond: typing.Optional[typing.Callable] = Nonescale_lr_cond: typing.Optional[typing.Callable] = Nonelr_mult: float = 1.0megatron_dataset_flag: bool = Falseseq_length: int = Noneencoder_seq_length: int = Nonedecoder_seq_length: int = Nonetensorboard_dir: str = Noneset_all_logging_options: bool = Falseeval_iters: int = 100eval_interval: int = 1000return_logits: bool = Falsecustom_train_step_class: typing.Optional[typing.Any] = Nonecustom_train_step_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = Nonecustom_model_provider_function: typing.Optional[typing.Callable] = Nonecustom_prepare_model_function: typing.Optional[typing.Callable] = Noneother_megatron_args: typing.Union[typing.Dict[str, typing.Any], NoneType] = None )
Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.
class accelerate.utils.MegatronLMDummyScheduler
( optimizertotal_num_steps = Nonewarmup_num_steps = 0**kwargs )
Parameters
optimizer (
torch.optim.optimizer.Optimizer
) โ The optimizer to wrap.total_num_steps (int) โ Total number of steps.
warmup_num_steps (int) โ Number of steps for warmup. **kwargs โ Other arguments.
Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.
class accelerate.utils.MegatronLMDummyDataLoader
( **dataset_kwargs )
Dummy dataloader presents model parameters or param groups, this is primarily used to follow conventional training
class accelerate.utils.AbstractTrainStep
( name )
Abstract class for batching, forward pass and loss handler.
class accelerate.utils.GPTTrainStep
( args )
Parameters
args (
argparse.Namespace
) โ Megatron-LM arguments.
GPT train step class.
class accelerate.utils.BertTrainStep
( args )
Parameters
args (
argparse.Namespace
) โ Megatron-LM arguments.
Bert train step class.
class accelerate.utils.T5TrainStep
( args )
Parameters
args (
argparse.Namespace
) โ Megatron-LM arguments.
T5 train step class.
accelerate.utils.avg_losses_across_data_parallel_group
( losses )
Parameters
losses (List[Tensor]) โ List of losses to average across data parallel group.
Average losses across data parallel group.
Last updated