Latent Diffusionge

LDMTextToImagePipeline

class diffusers.LDMTextToImagePipeline

<source>

( vqvae: typing.Union[diffusers.models.vq_model.VQModel, diffusers.models.autoencoder_kl.AutoencoderKL]bert: PreTrainedModeltokenizer: PreTrainedTokenizerunet: typing.Union[diffusers.models.unet_2d.UNet2DModel, diffusers.models.unet_2d_condition.UNet2DConditionModel]scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler] )

Parameters

  • vqvae (VQModel) — Vector-quantized (VQ) model to encode and decode images to and from latent representations.

  • bert (LDMBertModel) — Text-encoder model based on BERT.

  • tokenizer (BertTokenizer) — A BertTokenizer to tokenize text.

  • unet (UNet2DConditionModel) — A UNet2DConditionModel to denoise the encoded image latents.

  • scheduler (SchedulerMixin) — A scheduler to be used in combination with unet to denoise the encoded image latents. Can be one of DDIMScheduler, LMSDiscreteScheduler, or PNDMScheduler.

Pipeline for text-to-image generation using latent diffusion.

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).

__call__

<source>

( prompt: typing.Union[str, typing.List[str]]height: typing.Optional[int] = Nonewidth: typing.Optional[int] = Nonenum_inference_steps: typing.Optional[int] = 50guidance_scale: typing.Optional[float] = 1.0eta: typing.Optional[float] = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Nonelatents: typing.Optional[torch.FloatTensor] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = True**kwargs ) → ImagePipelineOutput or tuple

Parameters

  • prompt (str or List[str]) — The prompt or prompts to guide the image generation.

  • height (int, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor) — The height in pixels of the generated image.

  • width (int, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor) — The width in pixels of the generated image.

  • num_inference_steps (int, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

  • guidance_scale (float, optional, defaults to 1.0) — A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. Guidance scale is enabled when guidance_scale > 1.

  • generator (torch.Generator, optional) — A torch.Generator to make generation deterministic.

  • latents (torch.FloatTensor, optional) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied random generator.

  • output_type (str, optional, defaults to "pil") — The output format of the generated image. Choose between PIL.Image or np.array.

  • return_dict (bool, optional, defaults to True) — Whether or not to return a ImagePipelineOutput instead of a plain tuple.

Returns

ImagePipelineOutput or tuple

If return_dict is True, ImagePipelineOutput is returned, otherwise a tuple is returned where the first element is a list with the generated images.

The call function to the pipeline for generation.

Example:

Copied

>>> from diffusers import DiffusionPipeline

>>> # load model and scheduler
>>> ldm = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256")

>>> # run pipeline in inference (sample random noise and denoise)
>>> prompt = "A painting of a squirrel eating a burger"
>>> images = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6).images

>>> # save images
>>> for idx, image in enumerate(images):
...     image.save(f"squirrel-{idx}.png")

LDMSuperResolutionPipeline

class diffusers.LDMSuperResolutionPipeline

<source>

( vqvae: VQModelunet: UNet2DModelscheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler, diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler, diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler, diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler] )

Parameters

A pipeline for image super-resolution using latent diffusion.

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).

__call__

<source>

( image: typing.Union[torch.Tensor, PIL.Image.Image] = Nonebatch_size: typing.Optional[int] = 1num_inference_steps: typing.Optional[int] = 100eta: typing.Optional[float] = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = True ) → ImagePipelineOutput or tuple

Parameters

  • image (torch.Tensor or PIL.Image.Image) — Image or tensor representing an image batch to be used as the starting point for the process.

  • batch_size (int, optional, defaults to 1) — Number of images to generate.

  • num_inference_steps (int, optional, defaults to 100) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

  • eta (float, optional, defaults to 0.0) — Corresponds to parameter eta (η) from the DDIM paper. Only applies to the DDIMScheduler, and is ignored in other schedulers.

  • generator (torch.Generator or List[torch.Generator], optional) — A torch.Generator to make generation deterministic.

  • output_type (str, optional, defaults to "pil") — The output format of the generated image. Choose between PIL.Image or np.array.

  • return_dict (bool, optional, defaults to True) — Whether or not to return a ImagePipelineOutput instead of a plain tuple.

Returns

ImagePipelineOutput or tuple

If return_dict is True, ImagePipelineOutput is returned, otherwise a tuple is returned where the first element is a list with the generated images

The call function to the pipeline for generation.

Example:

Copied

>>> import requests
>>> from PIL import Image
>>> from io import BytesIO
>>> from diffusers import LDMSuperResolutionPipeline
>>> import torch

>>> # load model and scheduler
>>> pipeline = LDMSuperResolutionPipeline.from_pretrained("CompVis/ldm-super-resolution-4x-openimages")
>>> pipeline = pipeline.to("cuda")

>>> # let's download an  image
>>> url = (
...     "https://user-images.githubusercontent.com/38061659/199705896-b48e17b8-b231-47cd-a270-4ffa5a93fa3e.png"
... )
>>> response = requests.get(url)
>>> low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
>>> low_res_img = low_res_img.resize((128, 128))

>>> # run pipeline in inference (sample random noise and denoise)
>>> upscaled_image = pipeline(low_res_img, num_inference_steps=100, eta=1).images[0]
>>> # save image
>>> upscaled_image.save("ldm_generated_image.png")

ImagePipelineOutput

class diffusers.ImagePipelineOutput

<source>

( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] )

Parameters

  • images (List[PIL.Image.Image] or np.ndarray) — List of denoised PIL images of length batch_size or NumPy array of shape (batch_size, height, width, num_channels).

Output class for image pipelines.

Last updated