Habana Gaudi

How to use Stable Diffusion on Habana Gaudi

🌍 Diffusers is compatible with Habana Gaudi through 🌍 Optimum Habana.

Requirements

Optimum Habana 1.6 or later, here is how to install it.
SynapseAI 1.10.

Inference Pipeline

To generate images with Stable Diffusion 1 and 2 on Gaudi, you need to instantiate two instances:

A pipeline with GaudiStableDiffusionPipeline. This pipeline supports text-to-image generation.
A scheduler with GaudiDDIMScheduler. This scheduler has been optimized for Habana Gaudi.

When initializing the pipeline, you have to specify use_habana=True to deploy it on HPUs. Furthermore, in order to get the fastest possible generations you should enable HPU graphs with use_hpu_graphs=True. Finally, you will need to specify a Gaudi configuration which can be downloaded from the Hugging Face Hub.

Copied

from optimum.habana import GaudiConfig
from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionPipeline

model_name = "stabilityai/stable-diffusion-2-base"
scheduler = GaudiDDIMScheduler.from_pretrained(model_name, subfolder="scheduler")
pipeline = GaudiStableDiffusionPipeline.from_pretrained(
    model_name,
    scheduler=scheduler,
    use_habana=True,
    use_hpu_graphs=True,
    gaudi_config="Habana/stable-diffusion-2",
)

You can then call the pipeline to generate images by batches from one or several prompts:

Copied

outputs = pipeline(
    prompt=[
        "High quality photo of an astronaut riding a horse in space",
        "Face of a yellow cat, high resolution, sitting on a park bench",
    ],
    num_images_per_prompt=10,
    batch_size=4,
)

For more information, check out Optimum Habana’s documentation and the example provided in the official Github repository.

Benchmark

Here are the latencies for Habana first-generation Gaudi and Gaudi2 with the Habana/stable-diffusion and Habana/stable-diffusion-2 Gaudi configurations (mixed precision bf16/fp32):

Stable Diffusion v1.5 (512x512 resolution):

Latency (batch size = 1)

Throughput (batch size = 8)

first-generation Gaudi

3.80s

0.308 images/s

Gaudi2

1.33s

1.081 images/s

Stable Diffusion v2.1 (768x768 resolution):

Latency (batch size = 1)

Throughput

first-generation Gaudi

10.2s

0.108 images/s (batch size = 4)

Gaudi2

3.17s

0.379 images/s (batch size = 8)

PreviousMPS NextToken Merging

Last updated 2 years ago

hashtagHow to use Stable Diffusion on Habana Gaudi

hashtagRequirements

hashtagInference Pipeline

hashtagBenchmark

How to use Stable Diffusion on Habana Gaudi

Requirements

Inference Pipeline

Benchmark