Overview
Overview
Generating high-quality outputs is computationally intensive, especially during each iterative step where you go from a noisy output to a less noisy output. One of 𧨠Diffuserβs goal is to make this technology widely accessible to everyone, which includes enabling fast inference on consumer and specialized hardware.
This section will cover tips and tricks - like half-precision weights and sliced attention - for optimizing inference speed and reducing memory-consumption. You can also learn how to speed up your PyTorch code with torch.compile
or ONNX Runtime, and enable memory-efficient attention with xFormers. There are also guides for running inference on specific hardware like Apple Silicon, and Intel or Habana processors.
Last updated