How to use DeepSpeed
Last updated
Last updated
enables you to fit and train larger models on HPUs thanks to various optimizations described in the . In particular, you can use the two following ZeRO configurations that have been validated to be fully functioning with Gaudi:
ZeRO-1: partitions the optimizer states across processes.
ZeRO-2: partitions the optimizer states + gradients across processes.
These configurations are fully compatible with Habana Mixed Precision and can thus be used to train your model in bf16 precision.
You can find more information about DeepSpeed Gaudi integration .
To use DeepSpeed on Gaudi, you need to install Optimum Habana and with:
Copied
The allows using DeepSpeed as easily as the . This can be done in 3 steps:
A DeepSpeed configuration has to be defined.
The deepspeed
training argument enables to specify the path to the DeepSpeed configuration.
The deepspeed
launcher must be used to run your script.
The DeepSpeed configuration to use is passed through a JSON file and enables you to choose the optimizations to apply. Here is an example for applying ZeRO-2 optimizations and bf16 precision:
Copied
To use DeepSpeed, you must specify deespeed=path_to_my_deepspeed_configuration
in your GaudiTrainingArguments
instance:
Copied
This argument both indicates that DeepSpeed should be used and points to your DeepSpeed configuration.
Finally, there are two possible ways to launch your script:
Copied
where --argX
is an argument of the script to run with DeepSpeed.
Copied
You should set "use_fused_adam": false
in your Gaudi configuration because it is not compatible with DeepSpeed yet.
These steps are detailed below. A comprehensive guide about how to use DeepSpeed with the Transformers Trainer is also available .
The special value "auto"
enables to automatically get the correct or most efficient value. You can also specify the values yourself but, if you do so, you should be careful not to have conflicting values with your training arguments. It is strongly advised to read in the Transformers documentation to completely understand how this works.
Other examples of configurations for HPUs are proposed by Habana.
The explains how to write a configuration from scratch very well. A more complete description of all configuration possibilities is available .
Using the script:
Using the directly in code: