Accelerate
  • ๐ŸŒGETTING STARTED
    • BOINC AI Accelerate
    • Installation
    • Quicktour
  • ๐ŸŒTUTORIALS
    • Overview
    • Migrating to BOINC AI Accelerate
    • Launching distributed code
    • Launching distributed training from Jupyter Notebooks
  • ๐ŸŒHOW-TO GUIDES
    • Start Here!
    • Example Zoo
    • How to perform inference on large models with small resources
    • Knowing how big of a model you can fit into memory
    • How to quantize model
    • How to perform distributed inference with normal resources
    • Performing gradient accumulation
    • Accelerating training with local SGD
    • Saving and loading training states
    • Using experiment trackers
    • Debugging timeout errors
    • How to avoid CUDA Out-of-Memory
    • How to use Apple Silicon M1 GPUs
    • How to use DeepSpeed
    • How to use Fully Sharded Data Parallelism
    • How to use Megatron-LM
    • How to use BOINC AI Accelerate with SageMaker
    • How to use BOINC AI Accelerate with Intelยฎ Extension for PyTorch for cpu
  • ๐ŸŒCONCEPTS AND FUNDAMENTALS
    • BOINC AI Accelerate's internal mechanism
    • Loading big models into memory
    • Comparing performance across distributed setups
    • Executing and deferring jobs
    • Gradient synchronization
    • TPU best practices
  • ๐ŸŒREFERENCE
    • Main Accelerator class
    • Stateful configuration classes
    • The Command Line
    • Torch wrapper classes
    • Experiment trackers
    • Distributed launchers
    • DeepSpeed utilities
    • Logging
    • Working with large models
    • Kwargs handlers
    • Utility functions and classes
    • Megatron-LM Utilities
    • Fully Sharded Data Parallelism Utilities
Powered by GitBook
On this page
  1. REFERENCE

Distributed launchers

PreviousExperiment trackersNextDeepSpeed utilities

Last updated 1 year ago

Launchers

Functions for launching training on distributed processes.

accelerate.notebook_launcher

( functionargs = ()num_processes = Nonemixed_precision = 'no'use_port = '29500'master_addr = '127.0.0.1'node_rank = 0num_nodes = 1 )

Parameters

  • function (Callable) โ€” The training function to execute. If it accepts arguments, the first argument should be the index of the process run.

  • args (Tuple) โ€” Tuple of arguments to pass to the function (it will receive *args).

  • num_processes (int, optional) โ€” The number of processes to use for training. Will default to 8 in Colab/Kaggle if a TPU is available, to the number of GPUs available otherwise.

  • mixed_precision (str, optional, defaults to "no") โ€” If fp16 or bf16, will use mixed precision training on multi-GPU.

  • use_port (str, optional, defaults to "29500") โ€” The port to use to communicate between processes when launching a multi-GPU training.

  • master_addr (str, optional, defaults to "127.0.0.1") โ€” The address to use for communication between processes.

  • node_rank (int, optional, defaults to 0) โ€” The rank of the current node.

  • num_nodes (int, optional, defaults to 1) โ€” The number of nodes to use for training.

Launches a training function, using several processes or multiple nodes if itโ€™s possible in the current environment (TPU with multiple cores for instance).

To use this function absolutely zero calls to a CUDA device must be made in the notebook session before calling. If any have been made, you will need to restart the notebook and make sure no cells use any CUDA capability.

Setting ACCELERATE_DEBUG_MODE="1" in your environment will run a test before truly launching to ensure that none of those calls have been made.

Example:

Copied

# Assume this is defined in a Jupyter Notebook on an instance with two GPUs
from accelerate import notebook_launcher


def train(*args):
    # Your training function here
    ...


notebook_launcher(train, args=(arg1, arg2), num_processes=2, mixed_precision="fp16")

accelerate.debug_launcher

( functionargs = ()num_processes = 2 )

Parameters

  • function (Callable) โ€” The training function to execute.

  • args (Tuple) โ€” Tuple of arguments to pass to the function (it will receive *args).

  • num_processes (int, optional, defaults to 2) โ€” The number of processes to use for training.

Launches a training function using several processes on CPU for debugging purposes.

This function is provided for internal testing and debugging, but itโ€™s not intended for real trainings. It will only use the CPU.

๐ŸŒ
<source>
<source>