# Trainer

## Trainer

### ORTTrainer

#### class optimum.onnxruntime.ORTTrainer

[\<source>](https://github.com/huggingface/optimum/blob/main/optimum/onnxruntime/trainer.py#L136)

( model: typing.Union\[transformers.modeling\_utils.PreTrainedModel, torch.nn.modules.module.Module] = Noneargs: ORTTrainingArguments = Nonedata\_collator: typing.Optional\[DataCollator] = Nonetrain\_dataset: typing.Optional\[torch.utils.data.dataset.Dataset] = Noneeval\_dataset: typing.Union\[torch.utils.data.dataset.Dataset, typing.Dict\[str, torch.utils.data.dataset.Dataset], NoneType] = Nonetokenizer: typing.Optional\[transformers.tokenization\_utils\_base.PreTrainedTokenizerBase] = Nonemodel\_init: typing.Union\[typing.Callable\[\[], transformers.modeling\_utils.PreTrainedModel], NoneType] = Nonecompute\_metrics: typing.Union\[typing.Callable\[\[transformers.trainer\_utils.EvalPrediction], typing.Dict], NoneType] = Nonecallbacks: typing.Optional\[typing.List\[transformers.trainer\_callback.TrainerCallback]] = Noneoptimizers: typing.Tuple\[torch.optim.optimizer.Optimizer, torch.optim.lr\_scheduler.LambdaLR] = (None, None)preprocess\_logits\_for\_metrics: typing.Union\[typing.Callable\[\[torch.Tensor, torch.Tensor], torch.Tensor], NoneType] = None )

Parameters

* **model** ([PreTrainedModel](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel) or `torch.nn.Module`, *optional*) — The model to train, evaluate or use for predictions. If not provided, a `model_init` must be passed.

  `ORTTrainer` is optimized to work with the [PreTrainedModel](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel) provided by the transformers library. You can still use your own models defined as `torch.nn.Module` for training with ONNX Runtime backend and inference with PyTorch backend as long as they work the same way as the 🌍 Transformers models.
* **args** (`ORTTrainingArguments`, *optional*) — The arguments to tweak for training. Will default to a basic instance of `ORTTrainingArguments` with the `output_dir` set to a directory named *tmp\_trainer* in the current directory if not provided.
* **data\_collator** (`DataCollator`, *optional*) — The function to use to form a batch from a list of elements of `train_dataset` or `eval_dataset`. Will default to [default\_data\_collator](https://huggingface.co/docs/transformers/main/en/main_classes/data_collator#transformers.default_data_collator) if no `tokenizer` is provided, an instance of [DataCollatorWithPadding](https://huggingface.co/docs/transformers/main/en/main_classes/data_collator#transformers.DataCollatorWithPadding) otherwise.
* **train\_dataset** (`torch.utils.data.Dataset` or `torch.utils.data.IterableDataset`, *optional*) — The dataset to use for training. If it is a [Dataset](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset), columns not accepted by the `model.forward()` method are automatically removed. Note that if it’s a `torch.utils.data.IterableDataset` with some randomization and you are training in a distributed fashion, your iterable dataset should either use a internal attribute `generator` that is a `torch.Generator` for the randomization that must be identical on all processes (and the ORTTrainer will manually set the seed of this `generator` at each epoch) or have a `set_epoch()` method that internally sets the seed of the RNGs used.
* **eval\_dataset** (Union\[`torch.utils.data.Dataset`, Dict\[str, `torch.utils.data.Dataset`]), *optional*) — The dataset to use for evaluation. If it is a [Dataset](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset), columns not accepted by the `model.forward()` method are automatically removed. If it is a dictionary, it will evaluate on each dataset prepending the dictionary key to the metric name.
* **tokenizer** ([PreTrainedTokenizerBase](https://huggingface.co/docs/transformers/main/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase), *optional*) — The tokenizer used to preprocess the data. If provided, will be used to automatically pad the inputs the maximum length when batching inputs, and it will be saved along the model to make it easier to rerun an interrupted training or reuse the fine-tuned model.
* **model\_init** (`Callable[[], PreTrainedModel]`, *optional*) — A function that instantiates the model to be used. If provided, each call to `ORTTrainer.train` will start from a new instance of the model as given by this function. The function may have zero argument, or a single one containing the optuna/Ray Tune/SigOpt trial object, to be able to choose different architectures according to hyper parameters (such as layer count, sizes of inner layers, dropout probabilities etc).
* **compute\_metrics** (`Callable[[EvalPrediction], Dict]`, *optional*) — The function that will be used to compute metrics at evaluation. Must take a `EvalPrediction` and return a dictionary string to metric values.
* **callbacks** (List of `TrainerCallback`, *optional*) — A list of callbacks to customize the training loop. Will add those to the list of default callbacks detailed in [here](https://huggingface.co/docs/optimum/onnxruntime/package_reference/callback). If you want to remove one of the default callbacks used, use the `ORTTrainer.remove_callback` method.
* **optimizers** (`Tuple[torch.optim.Optimizer, torch.optim.lr_scheduler.LambdaLR]`, *optional*) — A tuple containing the optimizer and the scheduler to use. Will default to an instance of `AdamW` on your model and a scheduler given by `get_linear_schedule_with_warmup` controlled by `args`.
* **preprocess\_logits\_for\_metrics** (`Callable[[torch.Tensor, torch.Tensor], torch.Tensor]`, *optional*) — A function that preprocess the logits right before caching them at each evaluation step. Must take two tensors, the logits and the labels, and return the logits once processed as desired. The modifications made by this function will be reflected in the predictions received by `compute_metrics`. Note that the labels (second parameter) will be `None` if the dataset does not have them.

ORTTrainer is a simple but feature-complete training and eval loop for ONNX Runtime, optimized for 🌍 Transformers.

Important attributes:

* **model** — Always points to the core model. If using a transformers model, it will be a [PreTrainedModel](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel) subclass.
* **model\_wrapped** — Always points to the most external model in case one or more other modules wrap the original model. This is the model that should be used for the forward pass. For example, under `DeepSpeed`, the inner model is first wrapped in `ORTModule` and then in `DeepSpeed` and then again in `torch.nn.DistributedDataParallel`. If the inner model hasn’t been wrapped, then `self.model_wrapped` is the same as `self.model`.
* **is\_model\_parallel** — Whether or not a model has been switched to a model parallel mode (different from data parallelism, this means some of the model layers are split on different GPUs).
* **place\_model\_on\_device** — Whether or not to automatically place the model on the device - it will be set to `False` if model parallel or deepspeed is used, or if the default `ORTTrainingArguments.place_model_on_device` is overridden to return `False` .
* **is\_in\_train** — Whether or not a model is currently running `train` (e.g. when `evaluate` is called while in `train`)

**create\_optimizer**

[\<source>](https://github.com/huggingface/optimum/blob/main/optimum/onnxruntime/trainer.py#L969)

( )

Setup the optimizer.

We provide a reasonable default that works well. If you want to use something else, you can pass a tuple in the ORTTrainer’s init through `optimizers`, or subclass and override this method in a subclass.

**get\_ort\_optimizer\_cls\_and\_kwargs**

[\<source>](https://github.com/huggingface/optimum/blob/main/optimum/onnxruntime/trainer.py#L1019)

( args: ORTTrainingArguments )

Parameters

* **args** (`ORTTrainingArguments`) — The training arguments for the training session.

Returns the optimizer class and optimizer parameters implemented in ONNX Runtime based on `ORTTrainingArguments`.

**train**

[\<source>](https://github.com/huggingface/optimum/blob/main/optimum/onnxruntime/trainer.py#L287)

( resume\_from\_checkpoint: typing.Union\[str, bool, NoneType] = Nonetrial: typing.Union\[ForwardRef('optuna.Trial'), typing.Dict\[str, typing.Any]] = Noneignore\_keys\_for\_eval: typing.Optional\[typing.List\[str]] = None\*\*kwargs )

Parameters

* **resume\_from\_checkpoint** (`str` or `bool`, *optional*) — If a `str`, local path to a saved checkpoint as saved by a previous instance of `ORTTrainer`. If a `bool` and equals `True`, load the last checkpoint in *args.output\_dir* as saved by a previous instance of `ORTTrainer`. If present, training will resume from the model/optimizer/scheduler states loaded here.
* **trial** (`optuna.Trial` or `Dict[str, Any]`, *optional*) — The trial run or the hyperparameter dictionary for hyperparameter search.
* **ignore\_keys\_for\_eval** (`List[str]`, *optional*) — A list of keys in the output of your model (if it is a dictionary) that should be ignored when gathering predictions for evaluation during the training.
* **kwargs** (`Dict[str, Any]`, *optional*) — Additional keyword arguments used to hide deprecated arguments

Main entry point for training with ONNX Runtime accelerator.

### ORTSeq2SeqTrainer

#### class optimum.onnxruntime.ORTSeq2SeqTrainer

[\<source>](https://github.com/huggingface/optimum/blob/main/optimum/onnxruntime/trainer_seq2seq.py#L39)

( model: typing.Union\[transformers.modeling\_utils.PreTrainedModel, torch.nn.modules.module.Module] = Noneargs: ORTTrainingArguments = Nonedata\_collator: typing.Optional\[DataCollator] = Nonetrain\_dataset: typing.Optional\[torch.utils.data.dataset.Dataset] = Noneeval\_dataset: typing.Union\[torch.utils.data.dataset.Dataset, typing.Dict\[str, torch.utils.data.dataset.Dataset], NoneType] = Nonetokenizer: typing.Optional\[transformers.tokenization\_utils\_base.PreTrainedTokenizerBase] = Nonemodel\_init: typing.Union\[typing.Callable\[\[], transformers.modeling\_utils.PreTrainedModel], NoneType] = Nonecompute\_metrics: typing.Union\[typing.Callable\[\[transformers.trainer\_utils.EvalPrediction], typing.Dict], NoneType] = Nonecallbacks: typing.Optional\[typing.List\[transformers.trainer\_callback.TrainerCallback]] = Noneoptimizers: typing.Tuple\[torch.optim.optimizer.Optimizer, torch.optim.lr\_scheduler.LambdaLR] = (None, None)preprocess\_logits\_for\_metrics: typing.Union\[typing.Callable\[\[torch.Tensor, torch.Tensor], torch.Tensor], NoneType] = None )

**evaluate**

[\<source>](https://github.com/huggingface/optimum/blob/main/optimum/onnxruntime/trainer_seq2seq.py#L40)

( eval\_dataset: typing.Optional\[torch.utils.data.dataset.Dataset] = Noneignore\_keys: typing.Optional\[typing.List\[str]] = Nonemetric\_key\_prefix: str = 'eval'\*\*gen\_kwargs )

Parameters

* **eval\_dataset** (`Dataset`, *optional*) — Pass a dataset if you wish to override `self.eval_dataset`. If it is an [Dataset](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset), columns not accepted by the `model.forward()` method are automatically removed. It must implement the `__len__` method.
* **ignore\_keys** (`List[str]`, *optional*) — A list of keys in the output of your model (if it is a dictionary) that should be ignored when gathering predictions.
* **metric\_key\_prefix** (`str`, *optional*, defaults to `"eval"`) — An optional prefix to be used as the metrics key prefix. For example the metrics “bleu” will be named “eval\_bleu” if the prefix is `"eval"` (default)
* **max\_length** (`int`, *optional*) — The maximum target length to use when predicting with the generate method.
* **num\_beams** (`int`, *optional*) — Number of beams for beam search that will be used when predicting with the generate method. 1 means no beam search. gen\_kwargs — Additional `generate` specific kwargs.

Run evaluation and returns metrics.

The calling script will be responsible for providing a method to compute metrics, as they are task-dependent (pass it to the init `compute_metrics` argument).

You can also subclass and override this method to inject custom behavior.

**predict**

[\<source>](https://github.com/huggingface/optimum/blob/main/optimum/onnxruntime/trainer_seq2seq.py#L95)

( test\_dataset: Datasetignore\_keys: typing.Optional\[typing.List\[str]] = Nonemetric\_key\_prefix: str = 'test'\*\*gen\_kwargs )

Parameters

* **test\_dataset** (`Dataset`) — Dataset to run the predictions on. If it is a [Dataset](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset), columns not accepted by the `model.forward()` method are automatically removed. Has to implement the method `__len__`
* **ignore\_keys** (`List[str]`, *optional*) — A list of keys in the output of your model (if it is a dictionary) that should be ignored when gathering predictions.
* **metric\_key\_prefix** (`str`, *optional*, defaults to `"eval"`) — An optional prefix to be used as the metrics key prefix. For example the metrics “bleu” will be named “eval\_bleu” if the prefix is `"eval"` (default)
* **max\_length** (`int`, *optional*) — The maximum target length to use when predicting with the generate method.
* **num\_beams** (`int`, *optional*) — Number of beams for beam search that will be used when predicting with the generate method. 1 means no beam search. gen\_kwargs — Additional `generate` specific kwargs.

Run prediction and returns predictions and potential metrics.

Depending on the dataset and your use case, your test dataset may contain labels. In that case, this method will also return metrics, like in `evaluate()`.

If your predictions or labels have different sequence lengths (for instance because you’re doing dynamic padding in a token classification task) the predictions will be padded (on the right) to allow for concatenation into one array. The padding index is -100.

Returns: *NamedTuple* A namedtuple with the following keys:

* predictions (`np.ndarray`): The predictions on `test_dataset`.
* label\_ids (`np.ndarray`, *optional*): The labels (if the dataset contained some).
* metrics (`Dict[str, float]`, *optional*): The potential dictionary of metrics (if the dataset contained labels).

### ORTTrainingArguments

#### class optimum.onnxruntime.ORTTrainingArguments

[\<source>](https://github.com/huggingface/optimum/blob/main/optimum/onnxruntime/training_args.py#L61)

( output\_dir: stroverwrite\_output\_dir: bool = Falsedo\_train: bool = Falsedo\_eval: bool = Falsedo\_predict: bool = Falseevaluation\_strategy: typing.Union\[transformers.trainer\_utils.IntervalStrategy, str] = 'no'prediction\_loss\_only: bool = Falseper\_device\_train\_batch\_size: int = 8per\_device\_eval\_batch\_size: int = 8per\_gpu\_train\_batch\_size: typing.Optional\[int] = Noneper\_gpu\_eval\_batch\_size: typing.Optional\[int] = Nonegradient\_accumulation\_steps: int = 1eval\_accumulation\_steps: typing.Optional\[int] = Noneeval\_delay: typing.Optional\[float] = 0learning\_rate: float = 5e-05weight\_decay: float = 0.0adam\_beta1: float = 0.9adam\_beta2: float = 0.999adam\_epsilon: float = 1e-08max\_grad\_norm: float = 1.0num\_train\_epochs: float = 3.0max\_steps: int = -1lr\_scheduler\_type: typing.Union\[transformers.trainer\_utils.SchedulerType, str] = 'linear'warmup\_ratio: float = 0.0warmup\_steps: int = 0log\_level: typing.Optional\[str] = 'passive'log\_level\_replica: typing.Optional\[str] = 'warning'log\_on\_each\_node: bool = Truelogging\_dir: typing.Optional\[str] = Nonelogging\_strategy: typing.Union\[transformers.trainer\_utils.IntervalStrategy, str] = 'steps'logging\_first\_step: bool = Falselogging\_steps: float = 500logging\_nan\_inf\_filter: bool = Truesave\_strategy: typing.Union\[transformers.trainer\_utils.IntervalStrategy, str] = 'steps'save\_steps: float = 500save\_total\_limit: typing.Optional\[int] = Nonesave\_safetensors: typing.Optional\[bool] = Falsesave\_on\_each\_node: bool = Falseno\_cuda: bool = Falseuse\_cpu: bool = Falseuse\_mps\_device: bool = Falseseed: int = 42data\_seed: typing.Optional\[int] = Nonejit\_mode\_eval: bool = Falseuse\_ipex: bool = Falsebf16: bool = Falsefp16: bool = Falsefp16\_opt\_level: str = 'O1'half\_precision\_backend: str = 'auto'bf16\_full\_eval: bool = Falsefp16\_full\_eval: bool = Falsetf32: typing.Optional\[bool] = Nonelocal\_rank: int = -1ddp\_backend: typing.Optional\[str] = Nonetpu\_num\_cores: typing.Optional\[int] = Nonetpu\_metrics\_debug: bool = Falsedebug: typing.Union\[str, typing.List\[transformers.debug\_utils.DebugOption]] = ''dataloader\_drop\_last: bool = Falseeval\_steps: typing.Optional\[float] = Nonedataloader\_num\_workers: int = 0past\_index: int = -1run\_name: typing.Optional\[str] = Nonedisable\_tqdm: typing.Optional\[bool] = Noneremove\_unused\_columns: typing.Optional\[bool] = Truelabel\_names: typing.Optional\[typing.List\[str]] = Noneload\_best\_model\_at\_end: typing.Optional\[bool] = Falsemetric\_for\_best\_model: typing.Optional\[str] = Nonegreater\_is\_better: typing.Optional\[bool] = Noneignore\_data\_skip: bool = Falsesharded\_ddp: typing.Union\[typing.List\[transformers.trainer\_utils.ShardedDDPOption], str, NoneType] = ''fsdp: typing.Union\[typing.List\[transformers.trainer\_utils.FSDPOption], str, NoneType] = ''fsdp\_min\_num\_params: int = 0fsdp\_config: typing.Optional\[str] = Nonefsdp\_transformer\_layer\_cls\_to\_wrap: typing.Optional\[str] = Nonedeepspeed: typing.Optional\[str] = Nonelabel\_smoothing\_factor: float = 0.0optim: typing.Optional\[str] = 'adamw\_ba'optim\_args: typing.Optional\[str] = Noneadafactor: bool = Falsegroup\_by\_length: bool = Falselength\_column\_name: typing.Optional\[str] = 'length'report\_to: typing.Optional\[typing.List\[str]] = Noneddp\_find\_unused\_parameters: typing.Optional\[bool] = Noneddp\_bucket\_cap\_mb: typing.Optional\[int] = Noneddp\_broadcast\_buffers: typing.Optional\[bool] = Nonedataloader\_pin\_memory: bool = Trueskip\_memory\_metrics: bool = Trueuse\_legacy\_prediction\_loop: bool = Falsepush\_to\_hub: bool = Falseresume\_from\_checkpoint: typing.Optional\[str] = Nonehub\_model\_id: typing.Optional\[str] = Nonehub\_strategy: typing.Union\[transformers.trainer\_utils.HubStrategy, str] = 'every\_save'hub\_token: typing.Optional\[str] = Nonehub\_private\_repo: bool = Falsehub\_always\_push: bool = Falsegradient\_checkpointing: bool = Falseinclude\_inputs\_for\_metrics: bool = Falsefp16\_backend: str = 'auto'push\_to\_hub\_model\_id: typing.Optional\[str] = Nonepush\_to\_hub\_organization: typing.Optional\[str] = Nonepush\_to\_hub\_token: typing.Optional\[str] = Nonemp\_parameters: str = ''auto\_find\_batch\_size: bool = Falsefull\_determinism: bool = Falsetorchdynamo: typing.Optional\[str] = Noneray\_scope: typing.Optional\[str] = 'last'ddp\_timeout: typing.Optional\[int] = 1800torch\_compile: bool = Falsetorch\_compile\_backend: typing.Optional\[str] = Nonetorch\_compile\_mode: typing.Optional\[str] = Nonedispatch\_batches: typing.Optional\[bool] = Noneinclude\_tokens\_per\_second: typing.Optional\[bool] = Falseuse\_module\_with\_loss: typing.Optional\[bool] = False )

Parameters

* **optim** (`str` or `training_args.ORTOptimizerNames` or `transformers.training_args.OptimizerNames`, *optional*, defaults to `"adamw_ba"`) — The optimizer to use, including optimizers in Transformers: adamw\_ba, adamw\_torch, adamw\_apex\_fused, or adafactor. And optimizers implemented by ONNX Runtime: adamw\_ort\_fused.

### ORTSeq2SeqTrainingArguments

#### class optimum.onnxruntime.ORTSeq2SeqTrainingArguments

[\<source>](https://github.com/huggingface/optimum/blob/main/optimum/onnxruntime/training_args_seq2seq.py#L24)

( output\_dir: stroverwrite\_output\_dir: bool = Falsedo\_train: bool = Falsedo\_eval: bool = Falsedo\_predict: bool = Falseevaluation\_strategy: typing.Union\[transformers.trainer\_utils.IntervalStrategy, str] = 'no'prediction\_loss\_only: bool = Falseper\_device\_train\_batch\_size: int = 8per\_device\_eval\_batch\_size: int = 8per\_gpu\_train\_batch\_size: typing.Optional\[int] = Noneper\_gpu\_eval\_batch\_size: typing.Optional\[int] = Nonegradient\_accumulation\_steps: int = 1eval\_accumulation\_steps: typing.Optional\[int] = Noneeval\_delay: typing.Optional\[float] = 0learning\_rate: float = 5e-05weight\_decay: float = 0.0adam\_beta1: float = 0.9adam\_beta2: float = 0.999adam\_epsilon: float = 1e-08max\_grad\_norm: float = 1.0num\_train\_epochs: float = 3.0max\_steps: int = -1lr\_scheduler\_type: typing.Union\[transformers.trainer\_utils.SchedulerType, str] = 'linear'warmup\_ratio: float = 0.0warmup\_steps: int = 0log\_level: typing.Optional\[str] = 'passive'log\_level\_replica: typing.Optional\[str] = 'warning'log\_on\_each\_node: bool = Truelogging\_dir: typing.Optional\[str] = Nonelogging\_strategy: typing.Union\[transformers.trainer\_utils.IntervalStrategy, str] = 'steps'logging\_first\_step: bool = Falselogging\_steps: float = 500logging\_nan\_inf\_filter: bool = Truesave\_strategy: typing.Union\[transformers.trainer\_utils.IntervalStrategy, str] = 'steps'save\_steps: float = 500save\_total\_limit: typing.Optional\[int] = Nonesave\_safetensors: typing.Optional\[bool] = Falsesave\_on\_each\_node: bool = Falseno\_cuda: bool = Falseuse\_cpu: bool = Falseuse\_mps\_device: bool = Falseseed: int = 42data\_seed: typing.Optional\[int] = Nonejit\_mode\_eval: bool = Falseuse\_ipex: bool = Falsebf16: bool = Falsefp16: bool = Falsefp16\_opt\_level: str = 'O1'half\_precision\_backend: str = 'auto'bf16\_full\_eval: bool = Falsefp16\_full\_eval: bool = Falsetf32: typing.Optional\[bool] = Nonelocal\_rank: int = -1ddp\_backend: typing.Optional\[str] = Nonetpu\_num\_cores: typing.Optional\[int] = Nonetpu\_metrics\_debug: bool = Falsedebug: typing.Union\[str, typing.List\[transformers.debug\_utils.DebugOption]] = ''dataloader\_drop\_last: bool = Falseeval\_steps: typing.Optional\[float] = Nonedataloader\_num\_workers: int = 0past\_index: int = -1run\_name: typing.Optional\[str] = Nonedisable\_tqdm: typing.Optional\[bool] = Noneremove\_unused\_columns: typing.Optional\[bool] = Truelabel\_names: typing.Optional\[typing.List\[str]] = Noneload\_best\_model\_at\_end: typing.Optional\[bool] = Falsemetric\_for\_best\_model: typing.Optional\[str] = Nonegreater\_is\_better: typing.Optional\[bool] = Noneignore\_data\_skip: bool = Falsesharded\_ddp: typing.Union\[typing.List\[transformers.trainer\_utils.ShardedDDPOption], str, NoneType] = ''fsdp: typing.Union\[typing.List\[transformers.trainer\_utils.FSDPOption], str, NoneType] = ''fsdp\_min\_num\_params: int = 0fsdp\_config: typing.Optional\[str] = Nonefsdp\_transformer\_layer\_cls\_to\_wrap: typing.Optional\[str] = Nonedeepspeed: typing.Optional\[str] = Nonelabel\_smoothing\_factor: float = 0.0optim: typing.Optional\[str] = 'adamw\_ba'optim\_args: typing.Optional\[str] = Noneadafactor: bool = Falsegroup\_by\_length: bool = Falselength\_column\_name: typing.Optional\[str] = 'length'report\_to: typing.Optional\[typing.List\[str]] = Noneddp\_find\_unused\_parameters: typing.Optional\[bool] = Noneddp\_bucket\_cap\_mb: typing.Optional\[int] = Noneddp\_broadcast\_buffers: typing.Optional\[bool] = Nonedataloader\_pin\_memory: bool = Trueskip\_memory\_metrics: bool = Trueuse\_legacy\_prediction\_loop: bool = Falsepush\_to\_hub: bool = Falseresume\_from\_checkpoint: typing.Optional\[str] = Nonehub\_model\_id: typing.Optional\[str] = Nonehub\_strategy: typing.Union\[transformers.trainer\_utils.HubStrategy, str] = 'every\_save'hub\_token: typing.Optional\[str] = Nonehub\_private\_repo: bool = Falsehub\_always\_push: bool = Falsegradient\_checkpointing: bool = Falseinclude\_inputs\_for\_metrics: bool = Falsefp16\_backend: str = 'auto'push\_to\_hub\_model\_id: typing.Optional\[str] = Nonepush\_to\_hub\_organization: typing.Optional\[str] = Nonepush\_to\_hub\_token: typing.Optional\[str] = Nonemp\_parameters: str = ''auto\_find\_batch\_size: bool = Falsefull\_determinism: bool = Falsetorchdynamo: typing.Optional\[str] = Noneray\_scope: typing.Optional\[str] = 'last'ddp\_timeout: typing.Optional\[int] = 1800torch\_compile: bool = Falsetorch\_compile\_backend: typing.Optional\[str] = Nonetorch\_compile\_mode: typing.Optional\[str] = Nonedispatch\_batches: typing.Optional\[bool] = Noneinclude\_tokens\_per\_second: typing.Optional\[bool] = Falseuse\_module\_with\_loss: typing.Optional\[bool] = Falsesortish\_sampler: bool = Falsepredict\_with\_generate: bool = Falsegeneration\_max\_length: typing.Optional\[int] = Nonegeneration\_num\_beams: typing.Optional\[int] = Nonegeneration\_config: typing.Union\[str, pathlib.Path, transformers.generation.configuration\_utils.GenerationConfig, NoneType] = None )

Parameters

* **optim** (`str` or `training_args.ORTOptimizerNames` or `transformers.training_args.OptimizerNames`, *optional*, defaults to `"adamw_ba"`) — The optimizer to use, including optimizers in Transformers: adamw\_ba, adamw\_torch, adamw\_apex\_fused, or adafactor. And optimizers implemented by ONNX Runtime: adamw\_ort\_fused.
