Diffusers BOINC AI docs
  • 🌍GET STARTED
    • Diffusers
    • Quicktour
    • Effective and efficient diffusion
    • Installation
  • 🌍TUTORIALS
    • Overview
    • Understanding models and schedulers
    • AutoPipeline
    • Train a diffusion model
  • 🌍USING DIFFUSERS
    • 🌍LOADING & HUB
      • Overview
      • Load pipelines, models, and schedulers
      • Load and compare different schedulers
      • Load community pipelines
      • Load safetensors
      • Load different Stable Diffusion formats
      • Push files to the Hub
    • 🌍TASKS
      • Unconditional image generation
      • Text-to-image
      • Image-to-image
      • Inpainting
      • Depth-to-image
    • 🌍TECHNIQUES
      • Textual inversion
      • Distributed inference with multiple GPUs
      • Improve image quality with deterministic generation
      • Control image brightness
      • Prompt weighting
    • 🌍PIPELINES FOR INFERENCE
      • Overview
      • Stable Diffusion XL
      • ControlNet
      • Shap-E
      • DiffEdit
      • Distilled Stable Diffusion inference
      • Create reproducible pipelines
      • Community pipelines
      • How to contribute a community pipeline
    • 🌍TRAINING
      • Overview
      • Create a dataset for training
      • Adapt a model to a new task
      • Unconditional image generation
      • Textual Inversion
      • DreamBooth
      • Text-to-image
      • Low-Rank Adaptation of Large Language Models (LoRA)
      • ControlNet
      • InstructPix2Pix Training
      • Custom Diffusion
      • T2I-Adapters
    • 🌍TAKING DIFFUSERS BEYOND IMAGES
      • Other Modalities
  • 🌍OPTIMIZATION/SPECIAL HARDWARE
    • Overview
    • Memory and Speed
    • Torch2.0 support
    • Stable Diffusion in JAX/Flax
    • xFormers
    • ONNX
    • OpenVINO
    • Core ML
    • MPS
    • Habana Gaudi
    • Token Merging
  • 🌍CONCEPTUAL GUIDES
    • Philosophy
    • Controlled generation
    • How to contribute?
    • Diffusers' Ethical Guidelines
    • Evaluating Diffusion Models
  • 🌍API
    • 🌍MAIN CLASSES
      • Attention Processor
      • Diffusion Pipeline
      • Logging
      • Configuration
      • Outputs
      • Loaders
      • Utilities
      • VAE Image Processor
    • 🌍MODELS
      • Overview
      • UNet1DModel
      • UNet2DModel
      • UNet2DConditionModel
      • UNet3DConditionModel
      • VQModel
      • AutoencoderKL
      • AsymmetricAutoencoderKL
      • Tiny AutoEncoder
      • Transformer2D
      • Transformer Temporal
      • Prior Transformer
      • ControlNet
    • 🌍PIPELINES
      • Overview
      • AltDiffusion
      • Attend-and-Excite
      • Audio Diffusion
      • AudioLDM
      • AudioLDM 2
      • AutoPipeline
      • Consistency Models
      • ControlNet
      • ControlNet with Stable Diffusion XL
      • Cycle Diffusion
      • Dance Diffusion
      • DDIM
      • DDPM
      • DeepFloyd IF
      • DiffEdit
      • DiT
      • IF
      • PaInstructPix2Pix
      • Kandinsky
      • Kandinsky 2.2
      • Latent Diffusionge
      • MultiDiffusion
      • MusicLDM
      • PaintByExample
      • Parallel Sampling of Diffusion Models
      • Pix2Pix Zero
      • PNDM
      • RePaint
      • Score SDE VE
      • Self-Attention Guidance
      • Semantic Guidance
      • Shap-E
      • Spectrogram Diffusion
      • 🌍STABLE DIFFUSION
        • Overview
        • Text-to-image
        • Image-to-image
        • Inpainting
        • Depth-to-image
        • Image variation
        • Safe Stable Diffusion
        • Stable Diffusion 2
        • Stable Diffusion XL
        • Latent upscaler
        • Super-resolution
        • LDM3D Text-to-(RGB, Depth)
        • Stable Diffusion T2I-adapter
        • GLIGEN (Grounded Language-to-Image Generation)
      • Stable unCLIP
      • Stochastic Karras VE
      • Text-to-image model editing
      • Text-to-video
      • Text2Video-Zero
      • UnCLIP
      • Unconditional Latent Diffusion
      • UniDiffuser
      • Value-guided sampling
      • Versatile Diffusion
      • VQ Diffusion
      • Wuerstchen
    • 🌍SCHEDULERS
      • Overview
      • CMStochasticIterativeScheduler
      • DDIMInverseScheduler
      • DDIMScheduler
      • DDPMScheduler
      • DEISMultistepScheduler
      • DPMSolverMultistepInverse
      • DPMSolverMultistepScheduler
      • DPMSolverSDEScheduler
      • DPMSolverSinglestepScheduler
      • EulerAncestralDiscreteScheduler
      • EulerDiscreteScheduler
      • HeunDiscreteScheduler
      • IPNDMScheduler
      • KarrasVeScheduler
      • KDPM2AncestralDiscreteScheduler
      • KDPM2DiscreteScheduler
      • LMSDiscreteScheduler
      • PNDMScheduler
      • RePaintScheduler
      • ScoreSdeVeScheduler
      • ScoreSdeVpScheduler
      • UniPCMultistepScheduler
      • VQDiffusionScheduler
Powered by GitBook
On this page
  • ControlNet
  • Installing the dependencies
  • Circle filling dataset
  • Training
  • Training with multiple GPUs
  • Example results
  • Training on a 16 GB GPU
  • Training on a 12 GB GPU
  • Training on an 8 GB GPU
  • Inference
  • Stable Diffusion XL
  1. USING DIFFUSERS
  2. TRAINING

ControlNet

PreviousLow-Rank Adaptation of Large Language Models (LoRA)NextInstructPix2Pix Training

Last updated 1 year ago

ControlNet

(ControlNet) by Lvmin Zhang and Maneesh Agrawala.

This example is based on the . It trains a ControlNet to fill circles using a .

Installing the dependencies

Before running the scripts, make sure to install the library’s training dependencies.

To successfully run the latest versions of the example scripts, we highly recommend installing from source and keeping the installation up to date. We update the example scripts frequently and install example-specific requirements.

To do this, execute the following steps in a new virtual environment:

Copied

git clone https://github.com/boincai/diffusers
cd diffusers
pip install -e .

Then navigate into the

Copied

cd examples/controlnet

Now run:

Copied

pip install -r requirements.txt

Copied

accelerate config

Or for a default 🌍 Accelerate configuration without answering questions about your environment:

Copied

accelerate config default

Or if your environment doesn’t support an interactive shell like a notebook:

Copied

from accelerate.utils import write_basic_config

write_basic_config()

Circle filling dataset

Training

Download the following images to condition our training with:

Copied

wget https://boincai.com/datasets/boincai/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png

wget https://boincai.com/datasets/boincai/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png

The training script creates and saves a diffusion_pytorch_model.bin file in your repository.

Copied

export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path to save model"

accelerate launch train_controlnet.py \
 --pretrained_model_name_or_path=$MODEL_DIR \
 --output_dir=$OUTPUT_DIR \
 --dataset_name=fusing/fill50k \
 --resolution=512 \
 --learning_rate=1e-5 \
 --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
 --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
 --train_batch_size=4 \
 --push_to_hub

This default configuration requires ~38GB VRAM.

By default, the training script logs outputs to tensorboard. Pass --report_to wandb to use Weights & Biases.

Gradient accumulation with a smaller batch size can be used to reduce training requirements to ~20 GB VRAM.

Copied

export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path to save model"

accelerate launch train_controlnet.py \
 --pretrained_model_name_or_path=$MODEL_DIR \
 --output_dir=$OUTPUT_DIR \
 --dataset_name=fusing/fill50k \
 --resolution=512 \
 --learning_rate=1e-5 \
 --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
 --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
 --train_batch_size=1 \
 --gradient_accumulation_steps=4 \
  --push_to_hub

Training with multiple GPUs

Copied

export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path to save model"

accelerate launch --mixed_precision="fp16" --multi_gpu train_controlnet.py \
 --pretrained_model_name_or_path=$MODEL_DIR \
 --output_dir=$OUTPUT_DIR \
 --dataset_name=fusing/fill50k \
 --resolution=512 \
 --learning_rate=1e-5 \
 --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
 --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
 --train_batch_size=4 \
 --mixed_precision="fp16" \
 --tracker_project_name="controlnet-demo" \
 --report_to=wandb \
  --push_to_hub

Example results

After 300 steps with batch size 8

red circle with blue background

cyan circle with brown floral background

After 6000 steps with batch size 8:

red circle with blue background

cyan circle with brown floral background

Training on a 16 GB GPU

Enable the following optimizations to train on a 16GB GPU:

  • Gradient checkpointing

Now you can launch the training script:

Copied

export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path to save model"

accelerate launch train_controlnet.py \
 --pretrained_model_name_or_path=$MODEL_DIR \
 --output_dir=$OUTPUT_DIR \
 --dataset_name=fusing/fill50k \
 --resolution=512 \
 --learning_rate=1e-5 \
 --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
 --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
 --train_batch_size=1 \
 --gradient_accumulation_steps=4 \
 --gradient_checkpointing \
 --use_8bit_adam \
  --push_to_hub

Training on a 12 GB GPU

Enable the following optimizations to train on a 12GB GPU:

  • Gradient checkpointing

  • set gradients to None

Copied

export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path to save model"

accelerate launch train_controlnet.py \
 --pretrained_model_name_or_path=$MODEL_DIR \
 --output_dir=$OUTPUT_DIR \
 --dataset_name=fusing/fill50k \
 --resolution=512 \
 --learning_rate=1e-5 \
 --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
 --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
 --train_batch_size=1 \
 --gradient_accumulation_steps=4 \
 --gradient_checkpointing \
 --use_8bit_adam \
 --enable_xformers_memory_efficient_attention \
 --set_grads_to_none \
  --push_to_hub

When using enable_xformers_memory_efficient_attention, please make sure to install xformers by pip install xformers.

Training on an 8 GB GPU

We have not exhaustively tested DeepSpeed support for ControlNet. While the configuration does save memory, we have not confirmed whether the configuration trains successfully. You will very likely have to make changes to the config to have a successful training run.

Enable the following optimizations to train on a 8GB GPU:

  • Gradient checkpointing

  • set gradients to None

  • DeepSpeed stage 2 with parameter and optimizer offloading

  • fp16 mixed precision

You’ll have to configure your environment with accelerate config to enable DeepSpeed stage 2.

The configuration file should look like this:

Copied

compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 4
  offload_optimizer_device: cpu
  offload_param_device: cpu
  zero3_init_flag: false
  zero_stage: 2
distributed_type: DEEPSPEED

Changing the default Adam optimizer to DeepSpeed’s Adam deepspeed.ops.adam.DeepSpeedCPUAdam gives a substantial speedup but it requires a CUDA toolchain with the same version as PyTorch. 8-bit optimizer does not seem to be compatible with DeepSpeed at the moment.

Copied

export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path to save model"

accelerate launch train_controlnet.py \
 --pretrained_model_name_or_path=$MODEL_DIR \
 --output_dir=$OUTPUT_DIR \
 --dataset_name=fusing/fill50k \
 --resolution=512 \
 --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
 --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
 --train_batch_size=1 \
 --gradient_accumulation_steps=4 \
 --gradient_checkpointing \
 --enable_xformers_memory_efficient_attention \
 --set_grads_to_none \
 --mixed_precision fp16 \
 --push_to_hub

Inference

Copied

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers.utils import load_image
import torch

base_model_path = "path to model"
controlnet_path = "path to controlnet"

controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16, use_safetensors=True)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    base_model_path, controlnet=controlnet, torch_dtype=torch.float16, use_safetensors=True
)

# speed up diffusion process with faster scheduler and memory optimization
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
# remove following line if xformers is not installed
pipe.enable_xformers_memory_efficient_attention()

pipe.enable_model_cpu_offload()

control_image = load_image("./conditioning_image_1.png")
prompt = "pale golden rod circle with old lace background"

# generate image
generator = torch.manual_seed(0)
image = pipe(prompt, num_inference_steps=20, generator=generator, image=control_image).images[0]

image.save("./output.png")

Stable Diffusion XL

And initialize an 🌍 environment with:

The original dataset is hosted in the ControlNet , but we re-uploaded it to be compatible with 🌍 Datasets so that it can handle the data loading within the training script.

Our training examples use because that is what the original set of ControlNet models was trained on. However, ControlNet can be trained to augment any compatible Stable Diffusion model (such as ) or .

To use your own dataset, take a look at the guide.

Specify the MODEL_NAME environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the argument.

accelerate allows for seamless multi-GPU training. Follow the instructions for running distributed training with accelerate. Here is an example command:

bitsandbyte’s 8-bit optimizer (take a look at the [installation](() instructions if you don’t already have it installed)

bitsandbyte’s 8-bit optimizer (take a look at the [installation](() instructions if you don’t already have it installed)

xFormers (take a look at the instructions if you don’t already have it installed)

bitsandbyte’s 8-bit optimizer (take a look at the [installation](() instructions if you don’t already have it installed)

xFormers (take a look at the instructions if you don’t already have it installed)

can offload tensors from VRAM to either CPU or NVME. This requires significantly more RAM (about 25 GB).

See for more DeepSpeed configuration options.

The trained model can be run with the . Set base_model_path and controlnet_path to the values --pretrained_model_name_or_path and --output_dir were respectively set to in the training script.

Training with is also supported via the train_controlnet_sdxl.py script. Please refer to the docs .

🌍
🌍
Adding Conditional Control to Text-to-Image Diffusion Models
training example in the original ControlNet repository
small synthetic dataset
example folder
Accelerate
repo
here
runwayml/stable-diffusion-v1-5
CompVis/stable-diffusion-v1-4
stabilityai/stable-diffusion-2-1
Create a dataset for training
pretrained_model_name_or_path
here
https://github.com/TimDettmers/bitsandbytes#requirements—installation
https://github.com/TimDettmers/bitsandbytes#requirements—installation
installation
https://github.com/TimDettmers/bitsandbytes#requirements—installation
installation
DeepSpeed
documentation
StableDiffusionControlNetPipeline
Stable Diffusion XL
here