Diffusion Pipeline
Last updated
Last updated
The is the quickest way to load any pretrained diffusion pipeline from the for inference.
You shouldnβt use the class for training or finetuning a diffusion model. Individual components (for example, and ) of diffusion pipelines are usually trained individually, so we suggest directly working with them instead.
The pipeline type (for example ) of any diffusion pipeline loaded with is automatically detected and pipeline components are loaded and passed to the __init__
function of the pipeline.
Any pipeline object can be saved locally with .
( )
Base class for all pipelines.
stores all components (models, schedulers, and processors) for diffusion pipelines and provides methods for loading, downloading and saving models. It also includes methods to:
move all PyTorch modules to the device of your choice
enable/disable the progress bar for the denoising iteration
Class attributes:
config_name (str
) β The configuration filename that stores the class and module names of all the diffusion pipelineβs components.
_optional_components (List[str]
) β List of all optional components that donβt have to be passed to the pipeline to function (should be overridden by subclasses).
__call__
( *args**kwargs )
Call self as a function.
device
( ) β torch.device
Returns
torch.device
The torch device on which the pipeline is located.
to
( torch_device: typing.Union[str, torch.device, NoneType] = Nonetorch_dtype: typing.Optional[torch.dtype] = Nonesilence_dtype_warnings: bool = False )
components
( )
The self.components
property can be useful to run different pipelines with the same weights and configurations without reallocating additional memory.
Returns (dict
): A dictionary containing all the modules needed to initialize the pipeline.
Examples:
Copied
disable_attention_slicing
( )
Disable sliced attention computation. If enable_attention_slicing
was previously called, attention is computed in one step.
disable_xformers_memory_efficient_attention
( )
download
( pretrained_model_name**kwargs ) β os.PathLike
Parameters
pretrained_model_name (str
or os.PathLike
, optional) β A string, the repository id (for example CompVis/ldm-text2im-large-256
) of a pretrained pipeline hosted on the Hub.
custom_pipeline (str
, optional) β Can be either:
A string, the repository id (for example CompVis/ldm-text2im-large-256
) of a pretrained pipeline hosted on the Hub. The repository must contain a file called pipeline.py
that defines the custom pipeline.
A path to a directory (./my_pipeline_directory/
) containing a custom pipeline. The directory must contain a file called pipeline.py
that defines the custom pipeline.
π§ͺ This is an experimental feature and may change in the future.
force_download (bool
, optional, defaults to False
) β Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
resume_download (bool
, optional, defaults to False
) β Whether or not to resume downloading the model weights and configuration files. If set to False
, any incompletely downloaded files are deleted.
proxies (Dict[str, str]
, optional) β A dictionary of proxy servers to use by protocol or endpoint, for example, {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request.
output_loading_info(bool
, optional, defaults to False
) β Whether or not to also return a dictionary containing missing keys, unexpected keys and error messages.
local_files_only (bool
, optional, defaults to False
) β Whether to only load local model weights and configuration files or not. If set to True
, the model wonβt be downloaded from the Hub.
use_auth_token (str
or bool, optional) β The token to use as HTTP bearer authorization for remote files. If True
, the token generated from diffusers-cli login
(stored in ~/.boincai
) is used.
revision (str
, optional, defaults to "main"
) β The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier allowed by Git.
custom_revision (str
, optional, defaults to "main"
) β The specific model version to use. It can be a branch name, a tag name, or a commit id similar to revision
when loading a custom pipeline from the Hub. It can be a π Diffusers version when loading a custom pipeline from GitHub, otherwise it defaults to "main"
when loading from the Hub.
mirror (str
, optional) β Mirror source to resolve accessibility issues if youβre downloading a model in China. We do not guarantee the timeliness or safety of the source, and you should refer to the mirror site for more information.
variant (str
, optional) β Load weights from a specified variant filename such as "fp16"
or "ema"
. This is ignored when loading from_flax
.
use_safetensors (bool
, optional, defaults to None
) β If set to None
, the safetensors weights are downloaded if theyβre available and if the safetensors library is installed. If set to True
, the model is forcibly loaded from safetensors weights. If set to False
, safetensors weights are not loaded.
use_onnx (bool
, optional, defaults to False
) β If set to True
, ONNX weights will always be downloaded if present. If set to False
, ONNX weights will never be downloaded. By default use_onnx
defaults to the _is_onnx
class attribute which is False
for non-ONNX pipelines and True
for ONNX pipelines. ONNX weights include both files ending with .onnx
and .pb
.
Returns
os.PathLike
A path to the downloaded pipeline.
Download and cache a PyTorch diffusion pipeline from pretrained pipeline weights.
enable_attention_slicing
( slice_size: typing.Union[str, int, NoneType] = 'auto' )
Parameters
slice_size (str
or int
, optional, defaults to "auto"
) β When "auto"
, halves the input to the attention heads, so attention will be computed in two steps. If "max"
, maximum amount of memory will be saved by running only one slice at a time. If a number is provided, uses as many slices as attention_head_dim // slice_size
. In this case, attention_head_dim
must be a multiple of slice_size
.
Enable sliced attention computation. When this option is enabled, the attention module splits the input tensor in slices to compute attention in several steps. For more than one attention head, the computation is performed sequentially over each head. This is useful to save some memory in exchange for a small speed decrease.
β οΈ Donβt enable attention slicing if youβre already using scaled_dot_product_attention
(SDPA) from PyTorch 2.0 or xFormers. These attention computations are already very memory efficient so you wonβt need to enable this function. If you enable attention slicing with SDPA or xFormers, it can lead to serious slow downs!
Examples:
Copied
enable_model_cpu_offload
( gpu_id: int = 0device: typing.Union[torch.device, str] = 'cuda' )
Offloads all models to CPU using accelerate, reducing memory usage with a low impact on performance. Compared to enable_sequential_cpu_offload
, this method moves one whole model at a time to the GPU when its forward
method is called, and the model remains in GPU until the next model runs. Memory savings are lower than with enable_sequential_cpu_offload
, but performance is much better due to the iterative execution of the unet
.
enable_sequential_cpu_offload
( gpu_id: int = 0device: typing.Union[torch.device, str] = 'cuda' )
Offloads all models to CPU using π Accelerate, significantly reducing memory usage. When called, the state dicts of all torch.nn.Module
components (except those in self._exclude_from_cpu_offload
) are saved to CPU and then moved to torch.device('meta')
and loaded to GPU only when their specific submodule has its forward
method called. Offloading happens on a submodule basis. Memory savings are higher than with enable_model_cpu_offload
, but performance is lower.
enable_xformers_memory_efficient_attention
( attention_op: typing.Optional[typing.Callable] = None )
Parameters
β οΈ When memory efficient attention and sliced attention are both enabled, memory efficient attention takes precedent.
Examples:
Copied
from_pretrained
( pretrained_model_name_or_path: typing.Union[str, os.PathLike, NoneType]**kwargs )
Parameters
pretrained_model_name_or_path (str
or os.PathLike
, optional) β Can be either:
A string, the repo id (for example CompVis/ldm-text2im-large-256
) of a pretrained pipeline hosted on the Hub.
torch_dtype (str
or torch.dtype
, optional) β Override the default torch.dtype
and load the model with another dtype. If βautoβ is passed, the dtype is automatically derived from the modelβs weights.
custom_pipeline (str
, optional) β
π§ͺ This is an experimental feature and may change in the future.
Can be either:
A string, the repo id (for example hf-internal-testing/diffusers-dummy-pipeline
) of a custom pipeline hosted on the Hub. The repository must contain a file called pipeline.py that defines the custom pipeline.
A path to a directory (./my_pipeline_directory/
) containing a custom pipeline. The directory must contain a file called pipeline.py
that defines the custom pipeline.
force_download (bool
, optional, defaults to False
) β Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
cache_dir (Union[str, os.PathLike]
, optional) β Path to a directory where a downloaded pretrained model configuration is cached if the standard cache is not used.
resume_download (bool
, optional, defaults to False
) β Whether or not to resume downloading the model weights and configuration files. If set to False
, any incompletely downloaded files are deleted.
proxies (Dict[str, str]
, optional) β A dictionary of proxy servers to use by protocol or endpoint, for example, {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request.
output_loading_info(bool
, optional, defaults to False
) β Whether or not to also return a dictionary containing missing keys, unexpected keys and error messages.
local_files_only (bool
, optional, defaults to False
) β Whether to only load local model weights and configuration files or not. If set to True
, the model wonβt be downloaded from the Hub.
use_auth_token (str
or bool, optional) β The token to use as HTTP bearer authorization for remote files. If True
, the token generated from diffusers-cli login
(stored in ~/.boincai
) is used.
revision (str
, optional, defaults to "main"
) β The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier allowed by Git.
custom_revision (str
, optional, defaults to "main"
) β The specific model version to use. It can be a branch name, a tag name, or a commit id similar to revision
when loading a custom pipeline from the Hub. It can be a π Diffusers version when loading a custom pipeline from GitHub, otherwise it defaults to "main"
when loading from the Hub.
mirror (str
, optional) β Mirror source to resolve accessibility issues if youβre downloading a model in China. We do not guarantee the timeliness or safety of the source, and you should refer to the mirror site for more information.
device_map (str
or Dict[str, Union[int, str, torch.device]]
, optional) β A map that specifies where each submodule should go. It doesnβt need to be defined for each parameter/buffer name; once a given module name is inside, every submodule of it will be sent to the same device.
max_memory (Dict
, optional) β A dictionary device identifier for the maximum memory. Will default to the maximum memory available for each GPU and the available CPU RAM if unset.
offload_folder (str
or os.PathLike
, optional) β The path to offload weights if device_map contains the value "disk"
.
offload_state_dict (bool
, optional) β If True
, temporarily offloads the CPU state dict to the hard drive to avoid running out of CPU RAM if the weight of the CPU state dict + the biggest shard of the checkpoint does not fit. Defaults to True
when there is some disk offload.
low_cpu_mem_usage (bool
, optional, defaults to True
if torch version >= 1.9.0 else False
) β Speed up model loading only loading the pretrained weights and not initializing the weights. This also tries to not use more than 1x model size in CPU memory (including peak memory) while loading the model. Only supported for PyTorch >= 1.9.0. If you are using an older version of PyTorch, setting this argument to True
will raise an error.
use_safetensors (bool
, optional, defaults to None
) β If set to None
, the safetensors weights are downloaded if theyβre available and if the safetensors library is installed. If set to True
, the model is forcibly loaded from safetensors weights. If set to False
, safetensors weights are not loaded.
use_onnx (bool
, optional, defaults to None
) β If set to True
, ONNX weights will always be downloaded if present. If set to False
, ONNX weights will never be downloaded. By default use_onnx
defaults to the _is_onnx
class attribute which is False
for non-ONNX pipelines and True
for ONNX pipelines. ONNX weights include both files ending with .onnx
and .pb
.
kwargs (remaining dictionary of keyword arguments, optional) β Can be used to overwrite load and saveable variables (the pipeline components of the specific pipeline class). The overwritten components are passed directly to the pipelines __init__
method. See example below for more information.
variant (str
, optional) β Load weights from a specified variant filename such as "fp16"
or "ema"
. This is ignored when loading from_flax
.
Instantiate a PyTorch diffusion pipeline from pretrained pipeline weights.
The pipeline is set in evaluation mode (model.eval()
) by default.
If you get the error message below, you need to finetune the weights for your downstream task:
Copied
Examples:
Copied
maybe_free_model_hooks
( )
TODO: Better doc string
numpy_to_pil
( images )
Convert a NumPy image or a batch of images to a PIL image.
save_pretrained
( save_directory: typing.Union[str, os.PathLike]safe_serialization: bool = Truevariant: typing.Optional[str] = Nonepush_to_hub: bool = False**kwargs )
Parameters
save_directory (str
or os.PathLike
) β Directory to save a pipeline to. Will be created if it doesnβt exist.
safe_serialization (bool
, optional, defaults to True
) β Whether to save the model using safetensors
or the traditional PyTorch way with pickle
.
variant (str
, optional) β If specified, weights are saved in the format pytorch_model.<variant>.bin
.
push_to_hub (bool
, optional, defaults to False
) β Whether or not to push your model to the BOINC AI model hub after saving it. You can specify the repository you want to push to with repo_id
(will default to the name of save_directory
in your namespace).
Disable memory efficient attention from .
A string, the file name of a community pipeline hosted on GitHub under . Valid file names must match the file name and not the pipeline script (clip_guided_stable_diffusion
instead of clip_guided_stable_diffusion.py
). Community pipelines are always loaded from the current main
branch of GitHub.
For more information on how to load and create custom pipelines, take a look at .
To use private or , log-in with boincai-cli login
.
attention_op (Callable
, optional) β Override the default None
operator for use as op
argument to the function of xFormers.
Enable memory efficient attention from . When this option is enabled, you should observe lower GPU memory usage and a potential speed up during inference. Speed up during training is not guaranteed.
A path to a directory (for example ./my_pipeline_directory/
) containing pipeline weights saved using .
A string, the file name of a community pipeline hosted on GitHub under . Valid file names must match the file name and not the pipeline script (clip_guided_stable_diffusion
instead of clip_guided_stable_diffusion.py
). Community pipelines are always loaded from the current main branch of GitHub.
For more information on how to load and create custom pipelines, please have a look at
Set device_map="auto"
to have π Accelerate automatically compute the most optimized device_map
. For more information about each option see .
To use private or models, log-in with boincai-cli login
.
kwargs (Dict[str, Any]
, optional) β Additional keyword arguments passed along to the method.
Save all saveable variables of the pipeline to a directory. A pipeline variable can be saved and loaded if its class implements both a save and loading method. The pipeline is easily reloaded using the class method.