Distributed Runner
Last updated
Last updated
( command_list: typing.List = [] world_size: int = 1 hostfile: typing.Union[str, pathlib.Path] = None use_mpi: bool = False use_deepspeed: bool = False use_env: bool = False map_by: bool = 'socket' multi_hls = None )
Set up training/inference hardware configurations and run distributed commands.
create_multi_node_setup
( )
Multi-node configuration setup for DeepSpeed.
create_single_card_setup
( use_deepspeed = False )
Single-card setup.
create_single_node_setup
( )
Single-node multi-card configuration setup.
create_single_node_setup_deepspeed
( )
Single-node multi-card configuration setup for DeepSpeed.
create_single_node_setup_mpirun
( )
Single-node multi-card configuration setup for mpirun.
process_hostfile
( ) → str
Returns
str
address of the master node.
run
( )
Runs the desired command with configuration specified by the user.
Returns the master address to use for multi-node runs with DeepSpeed. Directly inspired from .