# Megatron-LM Utilities

## Utilities for Megatron-LM

#### class accelerate.utils.MegatronLMPlugin

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/dataclasses.py#L1018)

( tp\_degree: int = Nonepp\_degree: int = Nonenum\_micro\_batches: int = Nonegradient\_clipping: float = Nonesequence\_parallelism: bool = Nonerecompute\_activation: bool = Noneuse\_distributed\_optimizer: bool = Nonepipeline\_model\_parallel\_split\_rank: int = Nonenum\_layers\_per\_virtual\_pipeline\_stage: int = Noneis\_train\_batch\_min: str = Truetrain\_iters: int = Nonetrain\_samples: int = Noneweight\_decay\_incr\_style: str = 'constant'start\_weight\_decay: float = Noneend\_weight\_decay: float = Nonelr\_decay\_style: str = 'linear'lr\_decay\_iters: int = Nonelr\_decay\_samples: int = Nonelr\_warmup\_iters: int = Nonelr\_warmup\_samples: int = Nonelr\_warmup\_fraction: float = Nonemin\_lr: float = 0consumed\_samples: typing.List\[int] = Noneno\_wd\_decay\_cond: typing.Optional\[typing.Callable] = Nonescale\_lr\_cond: typing.Optional\[typing.Callable] = Nonelr\_mult: float = 1.0megatron\_dataset\_flag: bool = Falseseq\_length: int = Noneencoder\_seq\_length: int = Nonedecoder\_seq\_length: int = Nonetensorboard\_dir: str = Noneset\_all\_logging\_options: bool = Falseeval\_iters: int = 100eval\_interval: int = 1000return\_logits: bool = Falsecustom\_train\_step\_class: typing.Optional\[typing.Any] = Nonecustom\_train\_step\_kwargs: typing.Union\[typing.Dict\[str, typing.Any], NoneType] = Nonecustom\_model\_provider\_function: typing.Optional\[typing.Callable] = Nonecustom\_prepare\_model\_function: typing.Optional\[typing.Callable] = Noneother\_megatron\_args: typing.Union\[typing.Dict\[str, typing.Any], NoneType] = None )

Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.

#### class accelerate.utils.MegatronLMDummyScheduler

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/megatron_lm.py#L414)

( optimizertotal\_num\_steps = Nonewarmup\_num\_steps = 0\*\*kwargs )

Parameters

* **optimizer** (`torch.optim.optimizer.Optimizer`) — The optimizer to wrap.
* **total\_num\_steps** (int) — Total number of steps.
* **warmup\_num\_steps** (int) — Number of steps for warmup. \*\*kwargs — Other arguments.

Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.

#### class accelerate.utils.MegatronLMDummyDataLoader

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/megatron_lm.py#L144)

( \*\*dataset\_kwargs )

Dummy dataloader presents model parameters or param groups, this is primarily used to follow conventional training

#### class accelerate.utils.AbstractTrainStep

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/megatron_lm.py#L451)

( name )

Abstract class for batching, forward pass and loss handler.

#### class accelerate.utils.GPTTrainStep

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/megatron_lm.py#L597)

( args )

Parameters

* **args** (`argparse.Namespace`) — Megatron-LM arguments.

GPT train step class.

#### class accelerate.utils.BertTrainStep

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/megatron_lm.py#L468)

( args )

Parameters

* **args** (`argparse.Namespace`) — Megatron-LM arguments.

Bert train step class.

#### class accelerate.utils.T5TrainStep

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/megatron_lm.py#L703)

( args )

Parameters

* **args** (`argparse.Namespace`) — Megatron-LM arguments.

T5 train step class.

**accelerate.utils.avg\_losses\_across\_data\_parallel\_group**

[\<source>](https://github.com/huggingface/accelerate/blob/v0.24.0/src/accelerate/utils/megatron_lm.py#L1408)

( losses )

Parameters

* **losses** (List\[Tensor]) — List of losses to average across data parallel group.

Average losses across data parallel group.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://boinc-ai.gitbook.io/accelerate/reference/megatron-lm-utilities.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
