Distributed inference with multiple GPUs

Distributed inference with multiple GPUs

On distributed setups, you can run inference across multiple GPUs with 🌍 Acceleratearrow-up-right or PyTorch Distributedarrow-up-right, which is useful for generating with multiple prompts in parallel.

This guide will show you how to use 🌍 Accelerate and PyTorch Distributed for distributed inference.

🌍 Accelerate

🌍 Acceleratearrow-up-right is a library designed to make it easy to train or run inference across distributed setups. It simplifies the process of setting up the distributed environment, allowing you to focus on your PyTorch code.

To begin, create a Python file and initialize an accelerate.PartialStatearrow-up-right to create a distributed environment; your setup is automatically detected so you don’t need to explicitly define the rank or world_size. Move the DiffusionPipelinearrow-up-right to distributed_state.device to assign a GPU to each process.

Now use the split_between_processesarrow-up-right utility as a context manager to automatically distribute the prompts between the number of processes.

Copied

from accelerate import PartialState
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
)
distributed_state = PartialState()
pipeline.to(distributed_state.device)

with distributed_state.split_between_processes(["a dog", "a cat"]) as prompt:
    result = pipeline(prompt).images[0]
    result.save(f"result_{distributed_state.process_index}.png")

Use the --num_processes argument to specify the number of GPUs to use, and call accelerate launch to run the script:

Copied

To learn more, take a look at the Distributed Inference with arrow-up-right🌍 Acceleratearrow-up-right guide.

PyTorch Distributed

PyTorch supports DistributedDataParallelarrow-up-right which enables data parallelism.

To start, create a Python file and import torch.distributed and torch.multiprocessing to set up the distributed process group and to spawn the processes for inference on each GPU. You should also initialize a DiffusionPipelinearrow-up-right:

Copied

You’ll want to create a function to run inference; init_process_grouparrow-up-right handles creating a distributed environment with the type of backend to use, the rank of the current process, and the world_size or the number of processes participating. If you’re running inference in parallel over 2 GPUs, then the world_size is 2.

Move the DiffusionPipelinearrow-up-right to rank and use get_rank to assign a GPU to each process, where each process handles a different prompt:

Copied

To run the distributed inference, call mp.spawnarrow-up-right to run the run_inference function on the number of GPUs defined in world_size:

Copied

Once you’ve completed the inference script, use the --nproc_per_node argument to specify the number of GPUs to use and call torchrun to run the script:

Copied

Last updated