Hyperparameter Search using Trainer API

Hyperparameter Search using Trainer API

🌍 Transformers provides a Trainerarrow-up-right class optimized for training 🌍 Transformers models, making it easier to start training without manually writing your own training loop. The Trainerarrow-up-right provides API for hyperparameter search. This doc shows how to enable it in example.

Hyperparameter Search backend

Trainerarrow-up-right supports four hyperparameter search backends currently: optunaarrow-up-right, sigoptarrow-up-right, raytunearrow-up-right and wandbarrow-up-right.

you should install them before using them as the hyperparameter search backend

Copied

pip install optuna/sigopt/wandb/ray[tune] 

How to enable Hyperparameter search in example

Define the hyperparameter search space, different backends need different format.

For sigopt, see sigopt object_parameterarrow-up-right, it’s like following:

Copied

>>> def sigopt_hp_space(trial):
...     return [
...         {"bounds": {"min": 1e-6, "max": 1e-4}, "name": "learning_rate", "type": "double"},
...         {
...             "categorical_values": ["16", "32", "64", "128"],
...             "name": "per_device_train_batch_size",
...             "type": "categorical",
...         },
...     ]

For optuna, see optuna object_parameterarrow-up-right, it’s like following:

Copied

Optuna provides multi-objective HPO. You can pass direction in hyperparameter_search and define your own compute_objective to return multiple objective values. The Pareto Front (List[BestRun]) will be returned in hyperparameter_search, you should refer to the test case TrainerHyperParameterMultiObjectOptunaIntegrationTest in test_trainerarrow-up-right. It’s like following

Copied

For raytune, see raytune object_parameterarrow-up-right, it’s like following:

Copied

For wandb, see wandb object_parameterarrow-up-right, it’s like following:

Copied

Define a model_init function and pass it to the Trainerarrow-up-right, as an example:

Copied

Create a Trainerarrow-up-right with your model_init function, training arguments, training and test datasets, and evaluation function:

Copied

Call hyperparameter search, get the best trial parameters, backend could be "optuna"/"sigopt"/"wandb"/"ray". direction can be"minimize" or "maximize", which indicates whether to optimize greater or lower objective.

You could define your own compute_objective function, if not defined, the default compute_objective will be called, and the sum of eval metric like f1 is returned as objective value.

Copied

Hyperparameter search For DDP finetune

Currently, Hyperparameter search for DDP is enabled for optuna and sigopt. Only the rank-zero process will generate the search trial and pass the argument to other ranks.

Last updated