Torch2.0 support
Accelerated PyTorch 2.0 support in Diffusers
Installation
pip install --upgrade torch diffusersUsing accelerated transformers and torch.compile.
torch.compile.import torch from diffusers import DiffusionPipeline pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True) pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0]import torch from diffusers import DiffusionPipeline + from diffusers.models.attention_processor import AttnProcessor2_0 pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True).to("cuda") + pipe.unet.set_attn_processor(AttnProcessor2_0()) prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0]import torch from diffusers import DiffusionPipeline from diffusers.models.attention_processor import AttnProcessor pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True).to("cuda") pipe.unet.set_default_attn_processor() prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0]pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) images = pipe(prompt, num_inference_steps=steps, num_images_per_prompt=batch_size).images
Benchmark
Benchmarking code


A100 (batch size: 1)
A100 (batch size: 4)
A100 (batch size: 16)
V100 (batch size: 1)
V100 (batch size: 4)
V100 (batch size: 16)
T4 (batch size: 1)
T4 (batch size: 4)
T4 (batch size: 16)
RTX 3090 (batch size: 1)
RTX 3090 (batch size: 4)
RTX 3090 (batch size: 16)
RTX 4090 (batch size: 1)
RTX 4090 (batch size: 4)
RTX 4090 (batch size: 16)
Notes
Last updated