Distributed Training

Distributed training with Optimum Habana

As models get bigger, parallelism has emerged as a strategy for training larger models on limited hardware and accelerating training speed by several orders of magnitude.

All the PyTorch examplesarrow-up-right and the GaudiTrainerarrow-up-right script work out of the box with distributed training. There are two ways of launching them:

Copied

python gaudi_spawn.py \
    --world_size number_of_hpu_you_have --use_mpi \
    path_to_script.py --args1 --args2 ... --argsN

where --argX is an argument of the script to run in a distributed way. Examples are given for question answering herearrow-up-right and text classification herearrow-up-right.

  1. Using the DistributedRunnerarrow-up-right directly in code:

Copied

from optimum.habana.distributed import DistributedRunner
from optimum.utils import logging

world_size=8 # Number of HPUs to use (1 or 8)

# define distributed runner
distributed_runner = DistributedRunner(
    command_list=["scripts/train.py --args1 --args2 ... --argsN"],
    world_size=world_size,
    use_mpi=True,
)

# start job
ret_code = distributed_runner.run()

You can set the training argument --distribution_strategy fast_ddp for simpler and usually faster distributed training management. More information herearrow-up-right.

To go further, we invite you to read our guides about:

Last updated