Diffusers BOINC AI docs
  • ๐ŸŒGET STARTED
    • Diffusers
    • Quicktour
    • Effective and efficient diffusion
    • Installation
  • ๐ŸŒTUTORIALS
    • Overview
    • Understanding models and schedulers
    • AutoPipeline
    • Train a diffusion model
  • ๐ŸŒUSING DIFFUSERS
    • ๐ŸŒLOADING & HUB
      • Overview
      • Load pipelines, models, and schedulers
      • Load and compare different schedulers
      • Load community pipelines
      • Load safetensors
      • Load different Stable Diffusion formats
      • Push files to the Hub
    • ๐ŸŒTASKS
      • Unconditional image generation
      • Text-to-image
      • Image-to-image
      • Inpainting
      • Depth-to-image
    • ๐ŸŒTECHNIQUES
      • Textual inversion
      • Distributed inference with multiple GPUs
      • Improve image quality with deterministic generation
      • Control image brightness
      • Prompt weighting
    • ๐ŸŒPIPELINES FOR INFERENCE
      • Overview
      • Stable Diffusion XL
      • ControlNet
      • Shap-E
      • DiffEdit
      • Distilled Stable Diffusion inference
      • Create reproducible pipelines
      • Community pipelines
      • How to contribute a community pipeline
    • ๐ŸŒTRAINING
      • Overview
      • Create a dataset for training
      • Adapt a model to a new task
      • Unconditional image generation
      • Textual Inversion
      • DreamBooth
      • Text-to-image
      • Low-Rank Adaptation of Large Language Models (LoRA)
      • ControlNet
      • InstructPix2Pix Training
      • Custom Diffusion
      • T2I-Adapters
    • ๐ŸŒTAKING DIFFUSERS BEYOND IMAGES
      • Other Modalities
  • ๐ŸŒOPTIMIZATION/SPECIAL HARDWARE
    • Overview
    • Memory and Speed
    • Torch2.0 support
    • Stable Diffusion in JAX/Flax
    • xFormers
    • ONNX
    • OpenVINO
    • Core ML
    • MPS
    • Habana Gaudi
    • Token Merging
  • ๐ŸŒCONCEPTUAL GUIDES
    • Philosophy
    • Controlled generation
    • How to contribute?
    • Diffusers' Ethical Guidelines
    • Evaluating Diffusion Models
  • ๐ŸŒAPI
    • ๐ŸŒMAIN CLASSES
      • Attention Processor
      • Diffusion Pipeline
      • Logging
      • Configuration
      • Outputs
      • Loaders
      • Utilities
      • VAE Image Processor
    • ๐ŸŒMODELS
      • Overview
      • UNet1DModel
      • UNet2DModel
      • UNet2DConditionModel
      • UNet3DConditionModel
      • VQModel
      • AutoencoderKL
      • AsymmetricAutoencoderKL
      • Tiny AutoEncoder
      • Transformer2D
      • Transformer Temporal
      • Prior Transformer
      • ControlNet
    • ๐ŸŒPIPELINES
      • Overview
      • AltDiffusion
      • Attend-and-Excite
      • Audio Diffusion
      • AudioLDM
      • AudioLDM 2
      • AutoPipeline
      • Consistency Models
      • ControlNet
      • ControlNet with Stable Diffusion XL
      • Cycle Diffusion
      • Dance Diffusion
      • DDIM
      • DDPM
      • DeepFloyd IF
      • DiffEdit
      • DiT
      • IF
      • PaInstructPix2Pix
      • Kandinsky
      • Kandinsky 2.2
      • Latent Diffusionge
      • MultiDiffusion
      • MusicLDM
      • PaintByExample
      • Parallel Sampling of Diffusion Models
      • Pix2Pix Zero
      • PNDM
      • RePaint
      • Score SDE VE
      • Self-Attention Guidance
      • Semantic Guidance
      • Shap-E
      • Spectrogram Diffusion
      • ๐ŸŒSTABLE DIFFUSION
        • Overview
        • Text-to-image
        • Image-to-image
        • Inpainting
        • Depth-to-image
        • Image variation
        • Safe Stable Diffusion
        • Stable Diffusion 2
        • Stable Diffusion XL
        • Latent upscaler
        • Super-resolution
        • LDM3D Text-to-(RGB, Depth)
        • Stable Diffusion T2I-adapter
        • GLIGEN (Grounded Language-to-Image Generation)
      • Stable unCLIP
      • Stochastic Karras VE
      • Text-to-image model editing
      • Text-to-video
      • Text2Video-Zero
      • UnCLIP
      • Unconditional Latent Diffusion
      • UniDiffuser
      • Value-guided sampling
      • Versatile Diffusion
      • VQ Diffusion
      • Wuerstchen
    • ๐ŸŒSCHEDULERS
      • Overview
      • CMStochasticIterativeScheduler
      • DDIMInverseScheduler
      • DDIMScheduler
      • DDPMScheduler
      • DEISMultistepScheduler
      • DPMSolverMultistepInverse
      • DPMSolverMultistepScheduler
      • DPMSolverSDEScheduler
      • DPMSolverSinglestepScheduler
      • EulerAncestralDiscreteScheduler
      • EulerDiscreteScheduler
      • HeunDiscreteScheduler
      • IPNDMScheduler
      • KarrasVeScheduler
      • KDPM2AncestralDiscreteScheduler
      • KDPM2DiscreteScheduler
      • LMSDiscreteScheduler
      • PNDMScheduler
      • RePaintScheduler
      • ScoreSdeVeScheduler
      • ScoreSdeVpScheduler
      • UniPCMultistepScheduler
      • VQDiffusionScheduler
Powered by GitBook
On this page
  • Quicktour
  • DiffusionPipeline
  • Models
  • Schedulers
  • Next steps
  1. GET STARTED

Quicktour

PreviousDiffusersNextEffective and efficient diffusion

Last updated 1 year ago

Quicktour

Diffusion models are trained to denoise random Gaussian noise step-by-step to generate a sample of interest, such as an image or audio. This has sparked a tremendous amount of interest in generative AI, and you have probably seen examples of diffusion generated images on the internet. ๐Ÿงจ Diffusers is a library aimed at making diffusion models widely accessible to everyone.

Whether youโ€™re a developer or an everyday user, this quicktour will introduce you to ๐Ÿงจ Diffusers and help you get up and generating quickly! There are three main components of the library to know about:

  • The is a high-level end-to-end class designed to rapidly generate samples from pretrained diffusion models for inference.

  • Popular pretrained architectures and modules that can be used as building blocks for creating diffusion systems.

  • Many different - algorithms that control how noise is added for training, and how to generate denoised images during inference.

The quicktour will show you how to use the for inference, and then walk you through how to combine a model and scheduler to replicate whatโ€™s happening inside the .

The quicktour is a simplified version of the introductory ๐Ÿงจ Diffusers to help you get started quickly. If you want to learn more about ๐Ÿงจ Diffusers goal, design philosophy, and additional details about itโ€™s core API, check out the notebook!

Before you begin, make sure you have all the necessary libraries installed:

Copied

# uncomment to install the necessary libraries in Colab
#!pip install --upgrade diffusers accelerate transformers
  • ๐ŸŒ speeds up model loading for inference and training.

  • ๐ŸŒ is required to run the most popular diffusion models, such as .

DiffusionPipeline

The is the easiest way to use a pretrained diffusion system for inference. It is an end-to-end system containing the model and the scheduler. You can use the out-of-the-box for many tasks. Take a look at the table below for some supported tasks, and for a complete list of supported tasks, check out the table.

Task

Description

Pipeline

Unconditional Image Generation

generate an image from Gaussian noise

Text-Guided Image Generation

generate an image given a text prompt

Text-Guided Image-to-Image Translation

adapt an image guided by a text prompt

Text-Guided Image-Inpainting

fill the masked part of an image given the image, the mask and a text prompt

Text-Guided Depth-to-Image Translation

adapt parts of an image guided by a text prompt while preserving structure via depth estimation

Copied

>>> from diffusers import DiffusionPipeline

>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_safetensors=True)

Copied

>>> pipeline
StableDiffusionPipeline {
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.13.1",
  ...,
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  ...,
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

We strongly recommend running the pipeline on a GPU because the model consists of roughly 1.4 billion parameters. You can move the generator object to a GPU, just like you would in PyTorch:

Copied

>>> pipeline.to("cuda")

Copied

>>> image = pipeline("An image of a squirrel in Picasso style").images[0]
>>> image

Save the image by calling save:

Copied

>>> image.save("image_of_squirrel_painting.png")

Local pipeline

You can also use the pipeline locally. The only difference is you need to download the weights first:

Copied

!git lfs install
!git clone https://boincai.com/runwayml/stable-diffusion-v1-5

Then load the saved weights into the pipeline:

Copied

>>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5", use_safetensors=True)

Now you can run the pipeline as you would in the section above.

Swapping schedulers

Copied

>>> from diffusers import EulerDiscreteScheduler

>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_safetensors=True)
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)

Try generating an image with the new scheduler and see if you notice a difference!

Models

Copied

>>> from diffusers import UNet2DModel

>>> repo_id = "google/ddpm-cat-256"
>>> model = UNet2DModel.from_pretrained(repo_id, use_safetensors=True)

To access the model parameters, call model.config:

Copied

>>> model.config

The model configuration is a ๐ŸงŠ frozen ๐ŸงŠ dictionary, which means those parameters canโ€™t be changed after the model is created. This is intentional and ensures that the parameters used to define the model architecture at the start remain the same, while other parameters can still be adjusted during inference.

Some of the most important parameters are:

  • sample_size: the height and width dimension of the input sample.

  • in_channels: the number of input channels of the input sample.

  • down_block_types and up_block_types: the type of down- and upsampling blocks used to create the UNet architecture.

  • block_out_channels: the number of output channels of the downsampling blocks; also used in reverse order for the number of input channels of the upsampling blocks.

  • layers_per_block: the number of ResNet blocks present in each UNet block.

To use the model for inference, create the image shape with random Gaussian noise. It should have a batch axis because the model can receive multiple random noises, a channel axis corresponding to the number of input channels, and a sample_size axis for the height and width of the image:

Copied

>>> import torch

>>> torch.manual_seed(0)

>>> noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
>>> noisy_sample.shape
torch.Size([1, 3, 256, 256])

For inference, pass the noisy image to the model and a timestep. The timestep indicates how noisy the input image is, with more noise at the beginning and less at the end. This helps the model determine its position in the diffusion process, whether it is closer to the start or the end. Use the sample method to get the model output:

Copied

>>> with torch.no_grad():
...     noisy_residual = model(sample=noisy_sample, timestep=2).sample

To generate actual examples though, youโ€™ll need a scheduler to guide the denoising process. In the next section, youโ€™ll learn how to couple a model with a scheduler.

Schedulers

Schedulers manage going from a noisy sample to a less noisy sample given the model output - in this case, it is the noisy_residual.

Copied

>>> from diffusers import DDPMScheduler

>>> scheduler = DDPMScheduler.from_config(repo_id)
>>> scheduler
DDPMScheduler {
  "_class_name": "DDPMScheduler",
  "_diffusers_version": "0.13.1",
  "beta_end": 0.02,
  "beta_schedule": "linear",
  "beta_start": 0.0001,
  "clip_sample": true,
  "clip_sample_range": 1.0,
  "num_train_timesteps": 1000,
  "prediction_type": "epsilon",
  "trained_betas": null,
  "variance_type": "fixed_small"
}

๐Ÿ’ก Notice how the scheduler is instantiated from a configuration. Unlike a model, a scheduler does not have trainable weights and is parameter-free!

Some of the most important parameters are:

  • num_train_timesteps: the length of the denoising process or in other words, the number of timesteps required to process random Gaussian noise into a data sample.

  • beta_schedule: the type of noise schedule to use for inference and training.

  • beta_start and beta_end: the start and end noise values for the noise schedule.

Copied

>>> less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample
>>> less_noisy_sample.shape

The less_noisy_sample can be passed to the next timestep where itโ€™ll get even less noisier! Letโ€™s bring it all together now and visualize the entire denoising process.

First, create a function that postprocesses and displays the denoised image as a PIL.Image:

Copied

>>> import PIL.Image
>>> import numpy as np


>>> def display_sample(sample, i):
...     image_processed = sample.cpu().permute(0, 2, 3, 1)
...     image_processed = (image_processed + 1.0) * 127.5
...     image_processed = image_processed.numpy().astype(np.uint8)

...     image_pil = PIL.Image.fromarray(image_processed[0])
...     display(f"Image at step {i}")
...     display(image_pil)

To speed up the denoising process, move the input and model to a GPU:

Copied

>>> model.to("cuda")
>>> noisy_sample = noisy_sample.to("cuda")

Now create a denoising loop that predicts the residual of the less noisy sample, and computes the less noisy sample with the scheduler:

Copied

>>> import tqdm

>>> sample = noisy_sample

>>> for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
...     # 1. predict noise residual
...     with torch.no_grad():
...         residual = model(sample, t).sample

...     # 2. compute less noisy image and set x_t -> x_t-1
...     sample = scheduler.step(residual, t, sample).prev_sample

...     # 3. optionally look at image
...     if (i + 1) % 50 == 0:
...         display_sample(sample, i + 1)

Sit back and watch as a cat is generated from nothing but noise! ๐Ÿ˜ป

Next steps

Hopefully you generated some cool images with ๐Ÿงจ Diffusers in this quicktour! For your next steps, you can:

Start by creating an instance of a and specify which pipeline checkpoint you would like to download. You can use the for any stored on the Hugging Face Hub. In this quicktour, youโ€™ll load the checkpoint for text-to-image generation.

For models, please carefully read the first before running the model. ๐Ÿงจ Diffusers implements a to prevent offensive or harmful content, but the modelโ€™s improved image generation capabilities can still produce potentially harmful content.

Load the model with the method:

The downloads and caches all modeling, tokenization, and scheduling components. Youโ€™ll see that the Stable Diffusion pipeline is composed of the and among other things:

Now you can pass a text prompt to the pipeline to generate an image, and then access the denoised image. By default, the image output is wrapped in a object.

Different schedulers come with different denoising speeds and quality trade-offs. The best way to find out which one works best for you is to try them out! One of the main features of ๐Ÿงจ Diffusers is to allow you to easily switch between schedulers. For example, to replace the default with the , load it with the method:

In the next section, youโ€™ll take a closer look at the components - the model and scheduler - that make up the and learn how to use these components to generate an image of a cat.

Most models take a noisy sample, and at each timestep it predicts the noise residual (other models learn to predict the previous sample directly or the velocity or ), the difference between a less noisy image and the input image. You can mix and match models to create other diffusion systems.

Models are initiated with the method which also locally caches the model weights so it is faster the next time you load the model. For the quicktour, youโ€™ll load the , a basic unconditional image generation model with a checkpoint trained on cat images:

๐Ÿงจ Diffusers is a toolbox for building diffusion systems. While the is a convenient way to get started with a pre-built diffusion system, you can also choose your own model and scheduler components separately to build a custom diffusion system.

For the quicktour, youโ€™ll instantiate the with itโ€™s method:

To predict a slightly less noisy image, pass the following to the schedulerโ€™s method: model output, timestep, and current sample.

Train or finetune a model to generate your own images in the tutorial.

See example official and community for a variety of use cases.

Learn more about loading, accessing, changing and comparing schedulers in the guide.

Explore prompt engineering, speed and memory optimizations, and tips and tricks for generating higher quality images with the guide.

Dive deeper into speeding up ๐Ÿงจ Diffusers with guides on , and inference guides for running and .

๐ŸŒ
DiffusionPipeline
model
schedulers
DiffusionPipeline
DiffusionPipeline
notebook
Accelerate
Transformers
Stable Diffusion
DiffusionPipeline
DiffusionPipeline
๐Ÿงจ Diffusers Summary
DiffusionPipeline
DiffusionPipeline
checkpoint
stable-diffusion-v1-5
Stable Diffusion
license
safety_checker
from_pretrained()
DiffusionPipeline
UNet2DConditionModel
PNDMScheduler
PIL.Image
PNDMScheduler
EulerDiscreteScheduler
from_config()
DiffusionPipeline
v-prediction
from_pretrained()
UNet2DModel
DiffusionPipeline
DDPMScheduler
from_config()
step()
training
training or finetuning scripts
Using different Schedulers
Stable Diffusion
optimized PyTorch on a GPU
Stable Diffusion on Apple Silicon (M1/M2)
ONNX Runtime
unconditional_image_generation
conditional_image_generation
img2img
inpaint
depth2img