Overview

Overview

Generating high-quality outputs is computationally intensive, especially during each iterative step where you go from a noisy output to a less noisy output. One of ๐Ÿงจ Diffuserโ€™s goal is to make this technology widely accessible to everyone, which includes enabling fast inference on consumer and specialized hardware.

This section will cover tips and tricks - like half-precision weights and sliced attention - for optimizing inference speed and reducing memory-consumption. You can also learn how to speed up your PyTorch code with torch.compilearrow-up-right or ONNX Runtimearrow-up-right, and enable memory-efficient attention with xFormersarrow-up-right. There are also guides for running inference on specific hardware like Apple Silicon, and Intel or Habana processors.

Last updated