Latent Diffusionge
LDMTextToImagePipeline
class diffusers.LDMTextToImagePipeline
( vqvae: typing.Union[diffusers.models.vq_model.VQModel, diffusers.models.autoencoder_kl.AutoencoderKL]bert: PreTrainedModeltokenizer: PreTrainedTokenizerunet: typing.Union[diffusers.models.unet_2d.UNet2DModel, diffusers.models.unet_2d_condition.UNet2DConditionModel]scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler] )
Parameters
vqvae (VQModel) — Vector-quantized (VQ) model to encode and decode images to and from latent representations.
bert (
LDMBertModel
) — Text-encoder model based onBERT
.tokenizer (
BertTokenizer
) — ABertTokenizer
to tokenize text.unet (UNet2DConditionModel) — A
UNet2DConditionModel
to denoise the encoded image latents.scheduler (SchedulerMixin) — A scheduler to be used in combination with
unet
to denoise the encoded image latents. Can be one of DDIMScheduler, LMSDiscreteScheduler, or PNDMScheduler.
Pipeline for text-to-image generation using latent diffusion.
This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).
__call__
( prompt: typing.Union[str, typing.List[str]]height: typing.Optional[int] = Nonewidth: typing.Optional[int] = Nonenum_inference_steps: typing.Optional[int] = 50guidance_scale: typing.Optional[float] = 1.0eta: typing.Optional[float] = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Nonelatents: typing.Optional[torch.FloatTensor] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = True**kwargs ) → ImagePipelineOutput or tuple
Parameters
prompt (
str
orList[str]
) — The prompt or prompts to guide the image generation.height (
int
, optional, defaults toself.unet.config.sample_size * self.vae_scale_factor
) — The height in pixels of the generated image.width (
int
, optional, defaults toself.unet.config.sample_size * self.vae_scale_factor
) — The width in pixels of the generated image.num_inference_steps (
int
, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.guidance_scale (
float
, optional, defaults to 1.0) — A higher guidance scale value encourages the model to generate images closely linked to the textprompt
at the expense of lower image quality. Guidance scale is enabled whenguidance_scale > 1
.generator (
torch.Generator
, optional) — Atorch.Generator
to make generation deterministic.latents (
torch.FloatTensor
, optional) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied randomgenerator
.output_type (
str
, optional, defaults to"pil"
) — The output format of the generated image. Choose betweenPIL.Image
ornp.array
.return_dict (
bool
, optional, defaults toTrue
) — Whether or not to return a ImagePipelineOutput instead of a plain tuple.
Returns
ImagePipelineOutput or tuple
If return_dict
is True
, ImagePipelineOutput is returned, otherwise a tuple
is returned where the first element is a list with the generated images.
The call function to the pipeline for generation.
Example:
Copied
LDMSuperResolutionPipeline
class diffusers.LDMSuperResolutionPipeline
( vqvae: VQModelunet: UNet2DModelscheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler, diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler, diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler, diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler] )
Parameters
vqvae (VQModel) — Vector-quantized (VQ) model to encode and decode images to and from latent representations.
unet (UNet2DModel) — A
UNet2DModel
to denoise the encoded image.scheduler (SchedulerMixin) — A scheduler to be used in combination with
unet
to denoise the encoded image latens. Can be one of DDIMScheduler, LMSDiscreteScheduler, EulerDiscreteScheduler, EulerAncestralDiscreteScheduler, DPMSolverMultistepScheduler, or PNDMScheduler.
A pipeline for image super-resolution using latent diffusion.
This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).
__call__
( image: typing.Union[torch.Tensor, PIL.Image.Image] = Nonebatch_size: typing.Optional[int] = 1num_inference_steps: typing.Optional[int] = 100eta: typing.Optional[float] = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = True ) → ImagePipelineOutput or tuple
Parameters
image (
torch.Tensor
orPIL.Image.Image
) —Image
or tensor representing an image batch to be used as the starting point for the process.batch_size (
int
, optional, defaults to 1) — Number of images to generate.num_inference_steps (
int
, optional, defaults to 100) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.eta (
float
, optional, defaults to 0.0) — Corresponds to parameter eta (η) from the DDIM paper. Only applies to the DDIMScheduler, and is ignored in other schedulers.generator (
torch.Generator
orList[torch.Generator]
, optional) — Atorch.Generator
to make generation deterministic.output_type (
str
, optional, defaults to"pil"
) — The output format of the generated image. Choose betweenPIL.Image
ornp.array
.return_dict (
bool
, optional, defaults toTrue
) — Whether or not to return a ImagePipelineOutput instead of a plain tuple.
Returns
ImagePipelineOutput or tuple
If return_dict
is True
, ImagePipelineOutput is returned, otherwise a tuple
is returned where the first element is a list with the generated images
The call function to the pipeline for generation.
Example:
Copied
ImagePipelineOutput
class diffusers.ImagePipelineOutput
( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] )
Parameters
images (
List[PIL.Image.Image]
ornp.ndarray
) — List of denoised PIL images of lengthbatch_size
or NumPy array of shape(batch_size, height, width, num_channels)
.
Output class for image pipelines.
Last updated