Megatron-LM Utilities
Last updated
Last updated
( tp_degree: int = Nonepp_degree: int = Nonenum_micro_batches: int = Nonegradient_clipping: float = Nonesequence_parallelism: bool = Nonerecompute_activation: bool = Noneuse_distributed_optimizer: bool = Nonepipeline_model_parallel_split_rank: int = Nonenum_layers_per_virtual_pipeline_stage: int = Noneis_train_batch_min: str = Truetrain_iters: int = Nonetrain_samples: int = Noneweight_decay_incr_style: str = 'constant'start_weight_decay: float = Noneend_weight_decay: float = Nonelr_decay_style: str = 'linear'lr_decay_iters: int = Nonelr_decay_samples: int = Nonelr_warmup_iters: int = Nonelr_warmup_samples: int = Nonelr_warmup_fraction: float = Nonemin_lr: float = 0consumed_samples: typing.List[int] = Noneno_wd_decay_cond: typing.Optional[typing.Callable] = Nonescale_lr_cond: typing.Optional[typing.Callable] = Nonelr_mult: float = 1.0megatron_dataset_flag: bool = Falseseq_length: int = Noneencoder_seq_length: int = Nonedecoder_seq_length: int = Nonetensorboard_dir: str = Noneset_all_logging_options: bool = Falseeval_iters: int = 100eval_interval: int = 1000return_logits: bool = Falsecustom_train_step_class: typing.Optional[typing.Any] = Nonecustom_train_step_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = Nonecustom_model_provider_function: typing.Optional[typing.Callable] = Nonecustom_prepare_model_function: typing.Optional[typing.Callable] = Noneother_megatron_args: typing.Union[typing.Dict[str, typing.Any], NoneType] = None )
Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.
( optimizertotal_num_steps = Nonewarmup_num_steps = 0**kwargs )
Parameters
optimizer (torch.optim.optimizer.Optimizer
) โ The optimizer to wrap.
total_num_steps (int) โ Total number of steps.
warmup_num_steps (int) โ Number of steps for warmup. **kwargs โ Other arguments.
Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.
( **dataset_kwargs )
Dummy dataloader presents model parameters or param groups, this is primarily used to follow conventional training
( name )
Abstract class for batching, forward pass and loss handler.
( args )
Parameters
args (argparse.Namespace
) โ Megatron-LM arguments.
GPT train step class.
( args )
Parameters
args (argparse.Namespace
) โ Megatron-LM arguments.
Bert train step class.
( args )
Parameters
args (argparse.Namespace
) โ Megatron-LM arguments.
T5 train step class.
accelerate.utils.avg_losses_across_data_parallel_group
( losses )
Parameters
losses (List[Tensor]) โ List of losses to average across data parallel group.
Average losses across data parallel group.