IF
IF
Overview
DeepFloyd IF is a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding. The model is a modular composed of a frozen text encoder and three cascaded pixel diffusion modules:
Stage 1: a base model that generates 64x64 px image based on text prompt,
Stage 2: a 64x64 px => 256x256 px super-resolution model, and a
Stage 3: a 256x256 px => 1024x1024 px super-resolution model Stage 1 and Stage 2 utilize a frozen text encoder based on the T5 transformer to extract text embeddings, which are then fed into a UNet architecture enhanced with cross-attention and attention pooling. Stage 3 is Stability’s x4 Upscaling model. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID score of 6.66 on the COCO dataset. Our work underscores the potential of larger UNet architectures in the first stage of cascaded diffusion models and depicts a promising future for text-to-image synthesis.
Usage
Before you can use IF, you need to accept its usage conditions. To do so:
Make sure to have a BOINC AI account and be logged in
Accept the license on the model card of DeepFloyd/IF-I-XL-v1.0. Accepting the license on the stage I model card will auto accept for the other IF models.
Make sure to login locally. Install
boincai_hub
Copied
run the login function in a Python shell
Copied
and enter your BOINC AI Hub access token.
Next we install diffusers
and dependencies:
Copied
The following sections give more in-detail examples of how to use IF. Specifically:
Available checkpoints
Text-to-Image Generation
By default diffusers makes use of model cpu offloading to run the whole IF pipeline with as little as 14 GB of VRAM.
Copied
Text Guided Image-to-Image Generation
The same IF model weights can be used for text-guided image-to-image translation or image variation. In this case just make sure to load the weights using the IFInpaintingPipeline and IFInpaintingSuperResolutionPipeline pipelines.
Note: You can also directly move the weights of the text-to-image pipelines to the image-to-image pipelines without loading them twice by making use of the ~DiffusionPipeline.components()
function as explained here.
Copied
Text Guided Inpainting Generation
The same IF model weights can be used for text-guided image-to-image translation or image variation. In this case just make sure to load the weights using the IFInpaintingPipeline and IFInpaintingSuperResolutionPipeline pipelines.
Note: You can also directly move the weights of the text-to-image pipelines to the image-to-image pipelines without loading them twice by making use of the ~DiffusionPipeline.components()
function as explained here.
Copied
Converting between different pipelines
In addition to being loaded with from_pretrained
, Pipelines can also be loaded directly from each other.
Copied
Optimizing for speed
The simplest optimization to run IF faster is to move all model components to the GPU.
Copied
You can also run the diffusion process for a shorter number of timesteps.
This can either be done with the num_inference_steps
argument
Copied
Or with the timesteps
argument
Copied
When doing image variation or inpainting, you can also decrease the number of timesteps with the strength argument. The strength argument is the amount of noise to add to the input image which also determines how many steps to run in the denoising process. A smaller number will vary the image less but run faster.
Copied
You can also use torch.compile
. Note that we have not exhaustively tested torch.compile
with IF and it might not give expected results.
Copied
Optimizing for memory
When optimizing for GPU memory, we can use the standard diffusers cpu offloading APIs.
Either the model based CPU offloading,
Copied
or the more aggressive layer based CPU offloading.
Copied
Additionally, T5 can be loaded in 8bit precision
Copied
For CPU RAM constrained machines like google colab free tier where we can’t load all model components to the CPU at once, we can manually only load the pipeline with the text encoder or unet when the respective model components are needed.
Copied
Available Pipelines:
IFPipeline
class diffusers.IFPipeline
( tokenizer: T5Tokenizertext_encoder: T5EncoderModelunet: UNet2DConditionModelscheduler: DDPMSchedulersafety_checker: typing.Optional[diffusers.pipelines.deepfloyd_if.safety_checker.IFSafetyChecker]feature_extractor: typing.Optional[transformers.models.clip.image_processing_clip.CLIPImageProcessor]watermarker: typing.Optional[diffusers.pipelines.deepfloyd_if.watermark.IFWatermarker]requires_safety_checker: bool = True )
__call__
( prompt: typing.Union[str, typing.List[str]] = Nonenum_inference_steps: int = 100timesteps: typing.List[int] = Noneguidance_scale: float = 7.0negative_prompt: typing.Union[str, typing.List[str], NoneType] = Nonenum_images_per_prompt: typing.Optional[int] = 1height: typing.Optional[int] = Nonewidth: typing.Optional[int] = Noneeta: float = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, torch.FloatTensor], NoneType], NoneType] = Nonecallback_steps: int = 1clean_caption: bool = Truecross_attention_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None ) → ~pipelines.stable_diffusion.IFPipelineOutput
or tuple
Parameters
prompt (
str
orList[str]
, optional) — The prompt or prompts to guide the image generation. If not defined, one has to passprompt_embeds
. instead.num_inference_steps (
int
, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.timesteps (
List[int]
, optional) — Custom timesteps to use for the denoising process. If not defined, equal spacednum_inference_steps
timesteps are used. Must be in descending order.guidance_scale (
float
, optional, defaults to 7.5) — Guidance scale as defined in Classifier-Free Diffusion Guidance.guidance_scale
is defined asw
of equation 2. of Imagen Paper. Guidance scale is enabled by settingguidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the textprompt
, usually at the expense of lower image quality.negative_prompt (
str
orList[str]
, optional) — The prompt or prompts not to guide the image generation. If not defined, one has to passnegative_prompt_embeds
instead. Ignored when not using guidance (i.e., ignored ifguidance_scale
is less than1
).num_images_per_prompt (
int
, optional, defaults to 1) — The number of images to generate per prompt.height (
int
, optional, defaults to self.unet.config.sample_size) — The height in pixels of the generated image.width (
int
, optional, defaults to self.unet.config.sample_size) — The width in pixels of the generated image.eta (
float
, optional, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to schedulers.DDIMScheduler, will be ignored for others.generator (
torch.Generator
orList[torch.Generator]
, optional) — One or a list of torch generator(s) to make generation deterministic.prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated fromprompt
input argument.negative_prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated fromnegative_prompt
input argument.output_type (
str
, optional, defaults to"pil"
) — The output format of the generate image. Choose between PIL:PIL.Image.Image
ornp.array
.return_dict (
bool
, optional, defaults toTrue
) — Whether or not to return a~pipelines.stable_diffusion.IFPipelineOutput
instead of a plain tuple.callback (
Callable
, optional) — A function that will be called everycallback_steps
steps during inference. The function will be called with the following arguments:callback(step: int, timestep: int, latents: torch.FloatTensor)
.callback_steps (
int
, optional, defaults to 1) — The frequency at which thecallback
function will be called. If not specified, the callback will be called at every step.clean_caption (
bool
, optional, defaults toTrue
) — Whether or not to clean the caption before creating embeddings. Requiresbeautifulsoup4
andftfy
to be installed. If the dependencies are not installed, the embeddings will be created from the raw prompt.cross_attention_kwargs (
dict
, optional) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined underself.processor
in diffusers.cross_attention.
Returns
~pipelines.stable_diffusion.IFPipelineOutput
or tuple
~pipelines.stable_diffusion.IFPipelineOutput
if return_dict
is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of
bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) or watermarked content, according to the
safety_checker`.
Function invoked when calling the pipeline for generation.
Examples:
Copied
enable_model_cpu_offload
( gpu_id = 0 )
Offloads all models to CPU using accelerate, reducing memory usage with a low impact on performance. Compared to enable_sequential_cpu_offload
, this method moves one whole model at a time to the GPU when its forward
method is called, and the model remains in GPU until the next model runs. Memory savings are lower than with enable_sequential_cpu_offload
, but performance is much better due to the iterative execution of the unet
.
enable_sequential_cpu_offload
( gpu_id = 0 )
Offloads all models to CPU using accelerate, significantly reducing memory usage. When called, the pipeline’s models have their state dicts saved to CPU and then are moved to a torch.device('meta') and loaded to GPU only when their specific submodule has its
forward` method called.
encode_prompt
( promptdo_classifier_free_guidance = Truenum_images_per_prompt = 1device = Nonenegative_prompt = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneclean_caption: bool = False )
Parameters
prompt (
str
orList[str]
, optional) — prompt to be encoded
Encodes the prompt into text encoder hidden states.
device: (torch.device
, optional): torch device to place the resulting embeddings on num_images_per_prompt (int
, optional, defaults to 1): number of images that should be generated per prompt do_classifier_free_guidance (bool
, optional, defaults to True
): whether to use classifier free guidance or not negative_prompt (str
or List[str]
, optional): The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds
. instead. If not defined, one has to pass negative_prompt_embeds
. instead. Ignored when not using guidance (i.e., ignored if guidance_scale
is less than 1
). prompt_embeds (torch.FloatTensor
, optional): Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument. negative_prompt_embeds (torch.FloatTensor
, optional): Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt
input argument.
IFSuperResolutionPipeline
class diffusers.IFSuperResolutionPipeline
( tokenizer: T5Tokenizertext_encoder: T5EncoderModelunet: UNet2DConditionModelscheduler: DDPMSchedulerimage_noising_scheduler: DDPMSchedulersafety_checker: typing.Optional[diffusers.pipelines.deepfloyd_if.safety_checker.IFSafetyChecker]feature_extractor: typing.Optional[transformers.models.clip.image_processing_clip.CLIPImageProcessor]watermarker: typing.Optional[diffusers.pipelines.deepfloyd_if.watermark.IFWatermarker]requires_safety_checker: bool = True )
__call__
( prompt: typing.Union[str, typing.List[str]] = Noneheight: int = Nonewidth: int = Noneimage: typing.Union[PIL.Image.Image, numpy.ndarray, torch.FloatTensor] = Nonenum_inference_steps: int = 50timesteps: typing.List[int] = Noneguidance_scale: float = 4.0negative_prompt: typing.Union[str, typing.List[str], NoneType] = Nonenum_images_per_prompt: typing.Optional[int] = 1eta: float = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, torch.FloatTensor], NoneType], NoneType] = Nonecallback_steps: int = 1cross_attention_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = Nonenoise_level: int = 250clean_caption: bool = True ) → ~pipelines.stable_diffusion.IFPipelineOutput
or tuple
Parameters
prompt (
str
orList[str]
, optional) — The prompt or prompts to guide the image generation. If not defined, one has to passprompt_embeds
. instead.height (
int
, optional, defaults to self.unet.config.sample_size) — The height in pixels of the generated image.width (
int
, optional, defaults to self.unet.config.sample_size) — The width in pixels of the generated image.image (
PIL.Image.Image
,np.ndarray
,torch.FloatTensor
) — The image to be upscaled.num_inference_steps (
int
, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.timesteps (
List[int]
, optional) — Custom timesteps to use for the denoising process. If not defined, equal spacednum_inference_steps
timesteps are used. Must be in descending order.guidance_scale (
float
, optional, defaults to 7.5) — Guidance scale as defined in Classifier-Free Diffusion Guidance.guidance_scale
is defined asw
of equation 2. of Imagen Paper. Guidance scale is enabled by settingguidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the textprompt
, usually at the expense of lower image quality.negative_prompt (
str
orList[str]
, optional) — The prompt or prompts not to guide the image generation. If not defined, one has to passnegative_prompt_embeds
instead. Ignored when not using guidance (i.e., ignored ifguidance_scale
is less than1
).num_images_per_prompt (
int
, optional, defaults to 1) — The number of images to generate per prompt.eta (
float
, optional, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to schedulers.DDIMScheduler, will be ignored for others.generator (
torch.Generator
orList[torch.Generator]
, optional) — One or a list of torch generator(s) to make generation deterministic.prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated fromprompt
input argument.negative_prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated fromnegative_prompt
input argument.output_type (
str
, optional, defaults to"pil"
) — The output format of the generate image. Choose between PIL:PIL.Image.Image
ornp.array
.return_dict (
bool
, optional, defaults toTrue
) — Whether or not to return a~pipelines.stable_diffusion.IFPipelineOutput
instead of a plain tuple.callback (
Callable
, optional) — A function that will be called everycallback_steps
steps during inference. The function will be called with the following arguments:callback(step: int, timestep: int, latents: torch.FloatTensor)
.callback_steps (
int
, optional, defaults to 1) — The frequency at which thecallback
function will be called. If not specified, the callback will be called at every step.cross_attention_kwargs (
dict
, optional) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined underself.processor
in diffusers.cross_attention.noise_level (
int
, optional, defaults to 250) — The amount of noise to add to the upscaled image. Must be in the range[0, 1000)
clean_caption (
bool
, optional, defaults toTrue
) — Whether or not to clean the caption before creating embeddings. Requiresbeautifulsoup4
andftfy
to be installed. If the dependencies are not installed, the embeddings will be created from the raw prompt.
Returns
~pipelines.stable_diffusion.IFPipelineOutput
or tuple
~pipelines.stable_diffusion.IFPipelineOutput
if return_dict
is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of
bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) or watermarked content, according to the
safety_checker`.
Function invoked when calling the pipeline for generation.
Examples:
Copied
enable_model_cpu_offload
( gpu_id = 0 )
Offloads all models to CPU using accelerate, reducing memory usage with a low impact on performance. Compared to enable_sequential_cpu_offload
, this method moves one whole model at a time to the GPU when its forward
method is called, and the model remains in GPU until the next model runs. Memory savings are lower than with enable_sequential_cpu_offload
, but performance is much better due to the iterative execution of the unet
.
enable_sequential_cpu_offload
( gpu_id = 0 )
Offloads all models to CPU using accelerate, significantly reducing memory usage. When called, the pipeline’s models have their state dicts saved to CPU and then are moved to a torch.device('meta') and loaded to GPU only when their specific submodule has its
forward` method called.
encode_prompt
( promptdo_classifier_free_guidance = Truenum_images_per_prompt = 1device = Nonenegative_prompt = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneclean_caption: bool = False )
Parameters
prompt (
str
orList[str]
, optional) — prompt to be encoded
Encodes the prompt into text encoder hidden states.
device: (torch.device
, optional): torch device to place the resulting embeddings on num_images_per_prompt (int
, optional, defaults to 1): number of images that should be generated per prompt do_classifier_free_guidance (bool
, optional, defaults to True
): whether to use classifier free guidance or not negative_prompt (str
or List[str]
, optional): The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds
. instead. If not defined, one has to pass negative_prompt_embeds
. instead. Ignored when not using guidance (i.e., ignored if guidance_scale
is less than 1
). prompt_embeds (torch.FloatTensor
, optional): Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument. negative_prompt_embeds (torch.FloatTensor
, optional): Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt
input argument.
IFImg2ImgPipeline
class diffusers.IFImg2ImgPipeline
( tokenizer: T5Tokenizertext_encoder: T5EncoderModelunet: UNet2DConditionModelscheduler: DDPMSchedulersafety_checker: typing.Optional[diffusers.pipelines.deepfloyd_if.safety_checker.IFSafetyChecker]feature_extractor: typing.Optional[transformers.models.clip.image_processing_clip.CLIPImageProcessor]watermarker: typing.Optional[diffusers.pipelines.deepfloyd_if.watermark.IFWatermarker]requires_safety_checker: bool = True )
__call__
( prompt: typing.Union[str, typing.List[str]] = Noneimage: typing.Union[PIL.Image.Image, torch.Tensor, numpy.ndarray, typing.List[PIL.Image.Image], typing.List[torch.Tensor], typing.List[numpy.ndarray]] = Nonestrength: float = 0.7num_inference_steps: int = 80timesteps: typing.List[int] = Noneguidance_scale: float = 10.0negative_prompt: typing.Union[str, typing.List[str], NoneType] = Nonenum_images_per_prompt: typing.Optional[int] = 1eta: float = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, torch.FloatTensor], NoneType], NoneType] = Nonecallback_steps: int = 1clean_caption: bool = Truecross_attention_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None ) → ~pipelines.stable_diffusion.IFPipelineOutput
or tuple
Parameters
prompt (
str
orList[str]
, optional) — The prompt or prompts to guide the image generation. If not defined, one has to passprompt_embeds
. instead.image (
torch.FloatTensor
orPIL.Image.Image
) —Image
, or tensor representing an image batch, that will be used as the starting point for the process.strength (
float
, optional, defaults to 0.8) — Conceptually, indicates how much to transform the referenceimage
. Must be between 0 and 1.image
will be used as a starting point, adding more noise to it the larger thestrength
. The number of denoising steps depends on the amount of noise initially added. Whenstrength
is 1, added noise will be maximum and the denoising process will run for the full number of iterations specified innum_inference_steps
. A value of 1, therefore, essentially ignoresimage
.num_inference_steps (
int
, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.timesteps (
List[int]
, optional) — Custom timesteps to use for the denoising process. If not defined, equal spacednum_inference_steps
timesteps are used. Must be in descending order.guidance_scale (
float
, optional, defaults to 7.5) — Guidance scale as defined in Classifier-Free Diffusion Guidance.guidance_scale
is defined asw
of equation 2. of Imagen Paper. Guidance scale is enabled by settingguidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the textprompt
, usually at the expense of lower image quality.negative_prompt (
str
orList[str]
, optional) — The prompt or prompts not to guide the image generation. If not defined, one has to passnegative_prompt_embeds
instead. Ignored when not using guidance (i.e., ignored ifguidance_scale
is less than1
).num_images_per_prompt (
int
, optional, defaults to 1) — The number of images to generate per prompt.eta (
float
, optional, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to schedulers.DDIMScheduler, will be ignored for others.generator (
torch.Generator
orList[torch.Generator]
, optional) — One or a list of torch generator(s) to make generation deterministic.prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated fromprompt
input argument.negative_prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated fromnegative_prompt
input argument.output_type (
str
, optional, defaults to"pil"
) — The output format of the generate image. Choose between PIL:PIL.Image.Image
ornp.array
.return_dict (
bool
, optional, defaults toTrue
) — Whether or not to return a~pipelines.stable_diffusion.IFPipelineOutput
instead of a plain tuple.callback (
Callable
, optional) — A function that will be called everycallback_steps
steps during inference. The function will be called with the following arguments:callback(step: int, timestep: int, latents: torch.FloatTensor)
.callback_steps (
int
, optional, defaults to 1) — The frequency at which thecallback
function will be called. If not specified, the callback will be called at every step.clean_caption (
bool
, optional, defaults toTrue
) — Whether or not to clean the caption before creating embeddings. Requiresbeautifulsoup4
andftfy
to be installed. If the dependencies are not installed, the embeddings will be created from the raw prompt.cross_attention_kwargs (
dict
, optional) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined underself.processor
in diffusers.cross_attention.
Returns
~pipelines.stable_diffusion.IFPipelineOutput
or tuple
~pipelines.stable_diffusion.IFPipelineOutput
if return_dict
is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of
bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) or watermarked content, according to the
safety_checker`.
Function invoked when calling the pipeline for generation.
Examples:
Copied
enable_model_cpu_offload
( gpu_id = 0 )
Offloads all models to CPU using accelerate, reducing memory usage with a low impact on performance. Compared to enable_sequential_cpu_offload
, this method moves one whole model at a time to the GPU when its forward
method is called, and the model remains in GPU until the next model runs. Memory savings are lower than with enable_sequential_cpu_offload
, but performance is much better due to the iterative execution of the unet
.
enable_sequential_cpu_offload
( gpu_id = 0 )
Offloads all models to CPU using accelerate, significantly reducing memory usage. When called, the pipeline’s models have their state dicts saved to CPU and then are moved to a torch.device('meta') and loaded to GPU only when their specific submodule has its
forward` method called.
encode_prompt
( promptdo_classifier_free_guidance = Truenum_images_per_prompt = 1device = Nonenegative_prompt = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneclean_caption: bool = False )
Parameters
prompt (
str
orList[str]
, optional) — prompt to be encoded
Encodes the prompt into text encoder hidden states.
device: (torch.device
, optional): torch device to place the resulting embeddings on num_images_per_prompt (int
, optional, defaults to 1): number of images that should be generated per prompt do_classifier_free_guidance (bool
, optional, defaults to True
): whether to use classifier free guidance or not negative_prompt (str
or List[str]
, optional): The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds
. instead. If not defined, one has to pass negative_prompt_embeds
. instead. Ignored when not using guidance (i.e., ignored if guidance_scale
is less than 1
). prompt_embeds (torch.FloatTensor
, optional): Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument. negative_prompt_embeds (torch.FloatTensor
, optional): Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt
input argument.
IFImg2ImgSuperResolutionPipeline
class diffusers.IFImg2ImgSuperResolutionPipeline
( tokenizer: T5Tokenizertext_encoder: T5EncoderModelunet: UNet2DConditionModelscheduler: DDPMSchedulerimage_noising_scheduler: DDPMSchedulersafety_checker: typing.Optional[diffusers.pipelines.deepfloyd_if.safety_checker.IFSafetyChecker]feature_extractor: typing.Optional[transformers.models.clip.image_processing_clip.CLIPImageProcessor]watermarker: typing.Optional[diffusers.pipelines.deepfloyd_if.watermark.IFWatermarker]requires_safety_checker: bool = True )
__call__
( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.FloatTensor]original_image: typing.Union[PIL.Image.Image, torch.Tensor, numpy.ndarray, typing.List[PIL.Image.Image], typing.List[torch.Tensor], typing.List[numpy.ndarray]] = Nonestrength: float = 0.8prompt: typing.Union[str, typing.List[str]] = Nonenum_inference_steps: int = 50timesteps: typing.List[int] = Noneguidance_scale: float = 4.0negative_prompt: typing.Union[str, typing.List[str], NoneType] = Nonenum_images_per_prompt: typing.Optional[int] = 1eta: float = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, torch.FloatTensor], NoneType], NoneType] = Nonecallback_steps: int = 1cross_attention_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = Nonenoise_level: int = 250clean_caption: bool = True ) → ~pipelines.stable_diffusion.IFPipelineOutput
or tuple
Parameters
image (
torch.FloatTensor
orPIL.Image.Image
) —Image
, or tensor representing an image batch, that will be used as the starting point for the process.original_image (
torch.FloatTensor
orPIL.Image.Image
) — The original image thatimage
was varied from.strength (
float
, optional, defaults to 0.8) — Conceptually, indicates how much to transform the referenceimage
. Must be between 0 and 1.image
will be used as a starting point, adding more noise to it the larger thestrength
. The number of denoising steps depends on the amount of noise initially added. Whenstrength
is 1, added noise will be maximum and the denoising process will run for the full number of iterations specified innum_inference_steps
. A value of 1, therefore, essentially ignoresimage
.prompt (
str
orList[str]
, optional) — The prompt or prompts to guide the image generation. If not defined, one has to passprompt_embeds
. instead.num_inference_steps (
int
, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.timesteps (
List[int]
, optional) — Custom timesteps to use for the denoising process. If not defined, equal spacednum_inference_steps
timesteps are used. Must be in descending order.guidance_scale (
float
, optional, defaults to 7.5) — Guidance scale as defined in Classifier-Free Diffusion Guidance.guidance_scale
is defined asw
of equation 2. of Imagen Paper. Guidance scale is enabled by settingguidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the textprompt
, usually at the expense of lower image quality.negative_prompt (
str
orList[str]
, optional) — The prompt or prompts not to guide the image generation. If not defined, one has to passnegative_prompt_embeds
instead. Ignored when not using guidance (i.e., ignored ifguidance_scale
is less than1
).num_images_per_prompt (
int
, optional, defaults to 1) — The number of images to generate per prompt.eta (
float
, optional, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to schedulers.DDIMScheduler, will be ignored for others.generator (
torch.Generator
orList[torch.Generator]
, optional) — One or a list of torch generator(s) to make generation deterministic.prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated fromprompt
input argument.negative_prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated fromnegative_prompt
input argument.output_type (
str
, optional, defaults to"pil"
) — The output format of the generate image. Choose between PIL:PIL.Image.Image
ornp.array
.return_dict (
bool
, optional, defaults toTrue
) — Whether or not to return a~pipelines.stable_diffusion.IFPipelineOutput
instead of a plain tuple.callback (
Callable
, optional) — A function that will be called everycallback_steps
steps during inference. The function will be called with the following arguments:callback(step: int, timestep: int, latents: torch.FloatTensor)
.callback_steps (
int
, optional, defaults to 1) — The frequency at which thecallback
function will be called. If not specified, the callback will be called at every step.cross_attention_kwargs (
dict
, optional) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined underself.processor
in diffusers.cross_attention.noise_level (
int
, optional, defaults to 250) — The amount of noise to add to the upscaled image. Must be in the range[0, 1000)
clean_caption (
bool
, optional, defaults toTrue
) — Whether or not to clean the caption before creating embeddings. Requiresbeautifulsoup4
andftfy
to be installed. If the dependencies are not installed, the embeddings will be created from the raw prompt.
Returns
~pipelines.stable_diffusion.IFPipelineOutput
or tuple
~pipelines.stable_diffusion.IFPipelineOutput
if return_dict
is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of
bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) or watermarked content, according to the
safety_checker`.
Function invoked when calling the pipeline for generation.
Examples:
Copied
enable_model_cpu_offload
( gpu_id = 0 )
Offloads all models to CPU using accelerate, reducing memory usage with a low impact on performance. Compared to enable_sequential_cpu_offload
, this method moves one whole model at a time to the GPU when its forward
method is called, and the model remains in GPU until the next model runs. Memory savings are lower than with enable_sequential_cpu_offload
, but performance is much better due to the iterative execution of the unet
.
enable_sequential_cpu_offload
( gpu_id = 0 )
Offloads all models to CPU using accelerate, significantly reducing memory usage. When called, the pipeline’s models have their state dicts saved to CPU and then are moved to a torch.device('meta') and loaded to GPU only when their specific submodule has its
forward` method called.
encode_prompt
( promptdo_classifier_free_guidance = Truenum_images_per_prompt = 1device = Nonenegative_prompt = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneclean_caption: bool = False )
Parameters
prompt (
str
orList[str]
, optional) — prompt to be encoded
Encodes the prompt into text encoder hidden states.
device: (torch.device
, optional): torch device to place the resulting embeddings on num_images_per_prompt (int
, optional, defaults to 1): number of images that should be generated per prompt do_classifier_free_guidance (bool
, optional, defaults to True
): whether to use classifier free guidance or not negative_prompt (str
or List[str]
, optional): The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds
. instead. If not defined, one has to pass negative_prompt_embeds
. instead. Ignored when not using guidance (i.e., ignored if guidance_scale
is less than 1
). prompt_embeds (torch.FloatTensor
, optional): Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument. negative_prompt_embeds (torch.FloatTensor
, optional): Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt
input argument.
IFInpaintingPipeline
class diffusers.IFInpaintingPipeline
( tokenizer: T5Tokenizertext_encoder: T5EncoderModelunet: UNet2DConditionModelscheduler: DDPMSchedulersafety_checker: typing.Optional[diffusers.pipelines.deepfloyd_if.safety_checker.IFSafetyChecker]feature_extractor: typing.Optional[transformers.models.clip.image_processing_clip.CLIPImageProcessor]watermarker: typing.Optional[diffusers.pipelines.deepfloyd_if.watermark.IFWatermarker]requires_safety_checker: bool = True )
__call__
( prompt: typing.Union[str, typing.List[str]] = Noneimage: typing.Union[PIL.Image.Image, torch.Tensor, numpy.ndarray, typing.List[PIL.Image.Image], typing.List[torch.Tensor], typing.List[numpy.ndarray]] = Nonemask_image: typing.Union[PIL.Image.Image, torch.Tensor, numpy.ndarray, typing.List[PIL.Image.Image], typing.List[torch.Tensor], typing.List[numpy.ndarray]] = Nonestrength: float = 1.0num_inference_steps: int = 50timesteps: typing.List[int] = Noneguidance_scale: float = 7.0negative_prompt: typing.Union[str, typing.List[str], NoneType] = Nonenum_images_per_prompt: typing.Optional[int] = 1eta: float = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, torch.FloatTensor], NoneType], NoneType] = Nonecallback_steps: int = 1clean_caption: bool = Truecross_attention_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None ) → ~pipelines.stable_diffusion.IFPipelineOutput
or tuple
Parameters
prompt (
str
orList[str]
, optional) — The prompt or prompts to guide the image generation. If not defined, one has to passprompt_embeds
. instead.image (
torch.FloatTensor
orPIL.Image.Image
) —Image
, or tensor representing an image batch, that will be used as the starting point for the process.mask_image (
PIL.Image.Image
) —Image
, or tensor representing an image batch, to maskimage
. White pixels in the mask will be repainted, while black pixels will be preserved. Ifmask_image
is a PIL image, it will be converted to a single channel (luminance) before use. If it’s a tensor, it should contain one color channel (L) instead of 3, so the expected shape would be(B, H, W, 1)
.strength (
float
, optional, defaults to 0.8) — Conceptually, indicates how much to transform the referenceimage
. Must be between 0 and 1.image
will be used as a starting point, adding more noise to it the larger thestrength
. The number of denoising steps depends on the amount of noise initially added. Whenstrength
is 1, added noise will be maximum and the denoising process will run for the full number of iterations specified innum_inference_steps
. A value of 1, therefore, essentially ignoresimage
.num_inference_steps (
int
, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.timesteps (
List[int]
, optional) — Custom timesteps to use for the denoising process. If not defined, equal spacednum_inference_steps
timesteps are used. Must be in descending order.guidance_scale (
float
, optional, defaults to 7.5) — Guidance scale as defined in Classifier-Free Diffusion Guidance.guidance_scale
is defined asw
of equation 2. of Imagen Paper. Guidance scale is enabled by settingguidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the textprompt
, usually at the expense of lower image quality.negative_prompt (
str
orList[str]
, optional) — The prompt or prompts not to guide the image generation. If not defined, one has to passnegative_prompt_embeds
instead. Ignored when not using guidance (i.e., ignored ifguidance_scale
is less than1
).num_images_per_prompt (
int
, optional, defaults to 1) — The number of images to generate per prompt.eta (
float
, optional, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to schedulers.DDIMScheduler, will be ignored for others.generator (
torch.Generator
orList[torch.Generator]
, optional) — One or a list of torch generator(s) to make generation deterministic.prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated fromprompt
input argument.negative_prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated fromnegative_prompt
input argument.output_type (
str
, optional, defaults to"pil"
) — The output format of the generate image. Choose between PIL:PIL.Image.Image
ornp.array
.return_dict (
bool
, optional, defaults toTrue
) — Whether or not to return a~pipelines.stable_diffusion.IFPipelineOutput
instead of a plain tuple.callback (
Callable
, optional) — A function that will be called everycallback_steps
steps during inference. The function will be called with the following arguments:callback(step: int, timestep: int, latents: torch.FloatTensor)
.callback_steps (
int
, optional, defaults to 1) — The frequency at which thecallback
function will be called. If not specified, the callback will be called at every step.clean_caption (
bool
, optional, defaults toTrue
) — Whether or not to clean the caption before creating embeddings. Requiresbeautifulsoup4
andftfy
to be installed. If the dependencies are not installed, the embeddings will be created from the raw prompt.cross_attention_kwargs (
dict
, optional) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined underself.processor
in diffusers.cross_attention.
Returns
~pipelines.stable_diffusion.IFPipelineOutput
or tuple
~pipelines.stable_diffusion.IFPipelineOutput
if return_dict
is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of
bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) or watermarked content, according to the
safety_checker`.
Function invoked when calling the pipeline for generation.
Examples:
Copied
enable_model_cpu_offload
( gpu_id = 0 )
Offloads all models to CPU using accelerate, reducing memory usage with a low impact on performance. Compared to enable_sequential_cpu_offload
, this method moves one whole model at a time to the GPU when its forward
method is called, and the model remains in GPU until the next model runs. Memory savings are lower than with enable_sequential_cpu_offload
, but performance is much better due to the iterative execution of the unet
.
enable_sequential_cpu_offload
( gpu_id = 0 )
Offloads all models to CPU using accelerate, significantly reducing memory usage. When called, the pipeline’s models have their state dicts saved to CPU and then are moved to a torch.device('meta') and loaded to GPU only when their specific submodule has its
forward` method called.
encode_prompt
( promptdo_classifier_free_guidance = Truenum_images_per_prompt = 1device = Nonenegative_prompt = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneclean_caption: bool = False )
Parameters
prompt (
str
orList[str]
, optional) — prompt to be encoded
Encodes the prompt into text encoder hidden states.
device: (torch.device
, optional): torch device to place the resulting embeddings on num_images_per_prompt (int
, optional, defaults to 1): number of images that should be generated per prompt do_classifier_free_guidance (bool
, optional, defaults to True
): whether to use classifier free guidance or not negative_prompt (str
or List[str]
, optional): The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds
. instead. If not defined, one has to pass negative_prompt_embeds
. instead. Ignored when not using guidance (i.e., ignored if guidance_scale
is less than 1
). prompt_embeds (torch.FloatTensor
, optional): Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument. negative_prompt_embeds (torch.FloatTensor
, optional): Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt
input argument.
IFInpaintingSuperResolutionPipeline
class diffusers.IFInpaintingSuperResolutionPipeline
( tokenizer: T5Tokenizertext_encoder: T5EncoderModelunet: UNet2DConditionModelscheduler: DDPMSchedulerimage_noising_scheduler: DDPMSchedulersafety_checker: typing.Optional[diffusers.pipelines.deepfloyd_if.safety_checker.IFSafetyChecker]feature_extractor: typing.Optional[transformers.models.clip.image_processing_clip.CLIPImageProcessor]watermarker: typing.Optional[diffusers.pipelines.deepfloyd_if.watermark.IFWatermarker]requires_safety_checker: bool = True )
__call__
( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.FloatTensor]original_image: typing.Union[PIL.Image.Image, torch.Tensor, numpy.ndarray, typing.List[PIL.Image.Image], typing.List[torch.Tensor], typing.List[numpy.ndarray]] = Nonemask_image: typing.Union[PIL.Image.Image, torch.Tensor, numpy.ndarray, typing.List[PIL.Image.Image], typing.List[torch.Tensor], typing.List[numpy.ndarray]] = Nonestrength: float = 0.8prompt: typing.Union[str, typing.List[str]] = Nonenum_inference_steps: int = 100timesteps: typing.List[int] = Noneguidance_scale: float = 4.0negative_prompt: typing.Union[str, typing.List[str], NoneType] = Nonenum_images_per_prompt: typing.Optional[int] = 1eta: float = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, torch.FloatTensor], NoneType], NoneType] = Nonecallback_steps: int = 1cross_attention_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = Nonenoise_level: int = 0clean_caption: bool = True ) → ~pipelines.stable_diffusion.IFPipelineOutput
or tuple
Parameters
image (
torch.FloatTensor
orPIL.Image.Image
) —Image
, or tensor representing an image batch, that will be used as the starting point for the process.original_image (
torch.FloatTensor
orPIL.Image.Image
) — The original image thatimage
was varied from.mask_image (
PIL.Image.Image
) —Image
, or tensor representing an image batch, to maskimage
. White pixels in the mask will be repainted, while black pixels will be preserved. Ifmask_image
is a PIL image, it will be converted to a single channel (luminance) before use. If it’s a tensor, it should contain one color channel (L) instead of 3, so the expected shape would be(B, H, W, 1)
.strength (
float
, optional, defaults to 0.8) — Conceptually, indicates how much to transform the referenceimage
. Must be between 0 and 1.image
will be used as a starting point, adding more noise to it the larger thestrength
. The number of denoising steps depends on the amount of noise initially added. Whenstrength
is 1, added noise will be maximum and the denoising process will run for the full number of iterations specified innum_inference_steps
. A value of 1, therefore, essentially ignoresimage
.prompt (
str
orList[str]
, optional) — The prompt or prompts to guide the image generation. If not defined, one has to passprompt_embeds
. instead.num_inference_steps (
int
, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.timesteps (
List[int]
, optional) — Custom timesteps to use for the denoising process. If not defined, equal spacednum_inference_steps
timesteps are used. Must be in descending order.guidance_scale (
float
, optional, defaults to 7.5) — Guidance scale as defined in Classifier-Free Diffusion Guidance.guidance_scale
is defined asw
of equation 2. of Imagen Paper. Guidance scale is enabled by settingguidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the textprompt
, usually at the expense of lower image quality.negative_prompt (
str
orList[str]
, optional) — The prompt or prompts not to guide the image generation. If not defined, one has to passnegative_prompt_embeds
instead. Ignored when not using guidance (i.e., ignored ifguidance_scale
is less than1
).num_images_per_prompt (
int
, optional, defaults to 1) — The number of images to generate per prompt.eta (
float
, optional, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to schedulers.DDIMScheduler, will be ignored for others.generator (
torch.Generator
orList[torch.Generator]
, optional) — One or a list of torch generator(s) to make generation deterministic.prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated fromprompt
input argument.negative_prompt_embeds (
torch.FloatTensor
, optional) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated fromnegative_prompt
input argument.output_type (
str
, optional, defaults to"pil"
) — The output format of the generate image. Choose between PIL:PIL.Image.Image
ornp.array
.return_dict (
bool
, optional, defaults toTrue
) — Whether or not to return a~pipelines.stable_diffusion.IFPipelineOutput
instead of a plain tuple.callback (
Callable
, optional) — A function that will be called everycallback_steps
steps during inference. The function will be called with the following arguments:callback(step: int, timestep: int, latents: torch.FloatTensor)
.callback_steps (
int
, optional, defaults to 1) — The frequency at which thecallback
function will be called. If not specified, the callback will be called at every step.cross_attention_kwargs (
dict
, optional) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined underself.processor
in diffusers.cross_attention.noise_level (
int
, optional, defaults to 0) — The amount of noise to add to the upscaled image. Must be in the range[0, 1000)
clean_caption (
bool
, optional, defaults toTrue
) — Whether or not to clean the caption before creating embeddings. Requiresbeautifulsoup4
andftfy
to be installed. If the dependencies are not installed, the embeddings will be created from the raw prompt.
Returns
~pipelines.stable_diffusion.IFPipelineOutput
or tuple
~pipelines.stable_diffusion.IFPipelineOutput
if return_dict
is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of
bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) or watermarked content, according to the
safety_checker`.
Function invoked when calling the pipeline for generation.
Examples:
Copied
enable_model_cpu_offload
( gpu_id = 0 )
Offloads all models to CPU using accelerate, reducing memory usage with a low impact on performance. Compared to enable_sequential_cpu_offload
, this method moves one whole model at a time to the GPU when its forward
method is called, and the model remains in GPU until the next model runs. Memory savings are lower than with enable_sequential_cpu_offload
, but performance is much better due to the iterative execution of the unet
.
enable_sequential_cpu_offload
( gpu_id = 0 )
Offloads all models to CPU using accelerate, significantly reducing memory usage. When called, the pipeline’s models have their state dicts saved to CPU and then are moved to a torch.device('meta') and loaded to GPU only when their specific submodule has its
forward` method called.
encode_prompt
( promptdo_classifier_free_guidance = Truenum_images_per_prompt = 1device = Nonenegative_prompt = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneclean_caption: bool = False )
Parameters
prompt (
str
orList[str]
, optional) — prompt to be encoded
Encodes the prompt into text encoder hidden states.
device: (torch.device
, optional): torch device to place the resulting embeddings on num_images_per_prompt (int
, optional, defaults to 1): number of images that should be generated per prompt do_classifier_free_guidance (bool
, optional, defaults to True
): whether to use classifier free guidance or not negative_prompt (str
or List[str]
, optional): The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds
. instead. If not defined, one has to pass negative_prompt_embeds
. instead. Ignored when not using guidance (i.e., ignored if guidance_scale
is less than 1
). prompt_embeds (torch.FloatTensor
, optional): Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument. negative_prompt_embeds (torch.FloatTensor
, optional): Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt
input argument.
Last updated