Latent Diffusionge
Last updated
Last updated
( vqvae: typing.Union[diffusers.models.vq_model.VQModel, diffusers.models.autoencoder_kl.AutoencoderKL]bert: PreTrainedModeltokenizer: PreTrainedTokenizerunet: typing.Union[diffusers.models.unet_2d.UNet2DModel, diffusers.models.unet_2d_condition.UNet2DConditionModel]scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler] )
Parameters
vqvae () — Vector-quantized (VQ) model to encode and decode images to and from latent representations.
bert (LDMBertModel
) — Text-encoder model based on BERT
.
tokenizer (BertTokenizer
) — A BertTokenizer
to tokenize text.
unet () — A UNet2DConditionModel
to denoise the encoded image latents.
scheduler () — A scheduler to be used in combination with unet
to denoise the encoded image latents. Can be one of , , or .
Pipeline for text-to-image generation using latent diffusion.
This model inherits from . Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).
__call__
( prompt: typing.Union[str, typing.List[str]]height: typing.Optional[int] = Nonewidth: typing.Optional[int] = Nonenum_inference_steps: typing.Optional[int] = 50guidance_scale: typing.Optional[float] = 1.0eta: typing.Optional[float] = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Nonelatents: typing.Optional[torch.FloatTensor] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = True**kwargs ) → or tuple
Parameters
prompt (str
or List[str]
) — The prompt or prompts to guide the image generation.
height (int
, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor
) — The height in pixels of the generated image.
width (int
, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor
) — The width in pixels of the generated image.
num_inference_steps (int
, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float
, optional, defaults to 1.0) — A higher guidance scale value encourages the model to generate images closely linked to the text prompt
at the expense of lower image quality. Guidance scale is enabled when guidance_scale > 1
.
latents (torch.FloatTensor
, optional) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied random generator
.
output_type (str
, optional, defaults to "pil"
) — The output format of the generated image. Choose between PIL.Image
or np.array
.
Returns
The call function to the pipeline for generation.
Example:
Copied
( vqvae: VQModelunet: UNet2DModelscheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler, diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler, diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler, diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler] )
Parameters
A pipeline for image super-resolution using latent diffusion.
__call__
Parameters
image (torch.Tensor
or PIL.Image.Image
) — Image
or tensor representing an image batch to be used as the starting point for the process.
batch_size (int
, optional, defaults to 1) — Number of images to generate.
num_inference_steps (int
, optional, defaults to 100) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
output_type (str
, optional, defaults to "pil"
) — The output format of the generated image. Choose between PIL.Image
or np.array
.
Returns
The call function to the pipeline for generation.
Example:
Copied
( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] )
Parameters
images (List[PIL.Image.Image]
or np.ndarray
) — List of denoised PIL images of length batch_size
or NumPy array of shape (batch_size, height, width, num_channels)
.
Output class for image pipelines.
generator (torch.Generator
, optional) — A to make generation deterministic.
return_dict (bool
, optional, defaults to True
) — Whether or not to return a instead of a plain tuple.
or tuple
If return_dict
is True
, is returned, otherwise a tuple
is returned where the first element is a list with the generated images.
vqvae () — Vector-quantized (VQ) model to encode and decode images to and from latent representations.
unet () — A UNet2DModel
to denoise the encoded image.
scheduler () — A scheduler to be used in combination with unet
to denoise the encoded image latens. Can be one of , , , , , or .
This model inherits from . Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).
( image: typing.Union[torch.Tensor, PIL.Image.Image] = Nonebatch_size: typing.Optional[int] = 1num_inference_steps: typing.Optional[int] = 100eta: typing.Optional[float] = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = True ) → or tuple
eta (float
, optional, defaults to 0.0) — Corresponds to parameter eta (η) from the paper. Only applies to the , and is ignored in other schedulers.
generator (torch.Generator
or List[torch.Generator]
, optional) — A to make generation deterministic.
return_dict (bool
, optional, defaults to True
) — Whether or not to return a instead of a plain tuple.
or tuple
If return_dict
is True
, is returned, otherwise a tuple
is returned where the first element is a list with the generated images