Diffusers BOINC AI docs
  • 🌍GET STARTED
    • Diffusers
    • Quicktour
    • Effective and efficient diffusion
    • Installation
  • 🌍TUTORIALS
    • Overview
    • Understanding models and schedulers
    • AutoPipeline
    • Train a diffusion model
  • 🌍USING DIFFUSERS
    • 🌍LOADING & HUB
      • Overview
      • Load pipelines, models, and schedulers
      • Load and compare different schedulers
      • Load community pipelines
      • Load safetensors
      • Load different Stable Diffusion formats
      • Push files to the Hub
    • 🌍TASKS
      • Unconditional image generation
      • Text-to-image
      • Image-to-image
      • Inpainting
      • Depth-to-image
    • 🌍TECHNIQUES
      • Textual inversion
      • Distributed inference with multiple GPUs
      • Improve image quality with deterministic generation
      • Control image brightness
      • Prompt weighting
    • 🌍PIPELINES FOR INFERENCE
      • Overview
      • Stable Diffusion XL
      • ControlNet
      • Shap-E
      • DiffEdit
      • Distilled Stable Diffusion inference
      • Create reproducible pipelines
      • Community pipelines
      • How to contribute a community pipeline
    • 🌍TRAINING
      • Overview
      • Create a dataset for training
      • Adapt a model to a new task
      • Unconditional image generation
      • Textual Inversion
      • DreamBooth
      • Text-to-image
      • Low-Rank Adaptation of Large Language Models (LoRA)
      • ControlNet
      • InstructPix2Pix Training
      • Custom Diffusion
      • T2I-Adapters
    • 🌍TAKING DIFFUSERS BEYOND IMAGES
      • Other Modalities
  • 🌍OPTIMIZATION/SPECIAL HARDWARE
    • Overview
    • Memory and Speed
    • Torch2.0 support
    • Stable Diffusion in JAX/Flax
    • xFormers
    • ONNX
    • OpenVINO
    • Core ML
    • MPS
    • Habana Gaudi
    • Token Merging
  • 🌍CONCEPTUAL GUIDES
    • Philosophy
    • Controlled generation
    • How to contribute?
    • Diffusers' Ethical Guidelines
    • Evaluating Diffusion Models
  • 🌍API
    • 🌍MAIN CLASSES
      • Attention Processor
      • Diffusion Pipeline
      • Logging
      • Configuration
      • Outputs
      • Loaders
      • Utilities
      • VAE Image Processor
    • 🌍MODELS
      • Overview
      • UNet1DModel
      • UNet2DModel
      • UNet2DConditionModel
      • UNet3DConditionModel
      • VQModel
      • AutoencoderKL
      • AsymmetricAutoencoderKL
      • Tiny AutoEncoder
      • Transformer2D
      • Transformer Temporal
      • Prior Transformer
      • ControlNet
    • 🌍PIPELINES
      • Overview
      • AltDiffusion
      • Attend-and-Excite
      • Audio Diffusion
      • AudioLDM
      • AudioLDM 2
      • AutoPipeline
      • Consistency Models
      • ControlNet
      • ControlNet with Stable Diffusion XL
      • Cycle Diffusion
      • Dance Diffusion
      • DDIM
      • DDPM
      • DeepFloyd IF
      • DiffEdit
      • DiT
      • IF
      • PaInstructPix2Pix
      • Kandinsky
      • Kandinsky 2.2
      • Latent Diffusionge
      • MultiDiffusion
      • MusicLDM
      • PaintByExample
      • Parallel Sampling of Diffusion Models
      • Pix2Pix Zero
      • PNDM
      • RePaint
      • Score SDE VE
      • Self-Attention Guidance
      • Semantic Guidance
      • Shap-E
      • Spectrogram Diffusion
      • 🌍STABLE DIFFUSION
        • Overview
        • Text-to-image
        • Image-to-image
        • Inpainting
        • Depth-to-image
        • Image variation
        • Safe Stable Diffusion
        • Stable Diffusion 2
        • Stable Diffusion XL
        • Latent upscaler
        • Super-resolution
        • LDM3D Text-to-(RGB, Depth)
        • Stable Diffusion T2I-adapter
        • GLIGEN (Grounded Language-to-Image Generation)
      • Stable unCLIP
      • Stochastic Karras VE
      • Text-to-image model editing
      • Text-to-video
      • Text2Video-Zero
      • UnCLIP
      • Unconditional Latent Diffusion
      • UniDiffuser
      • Value-guided sampling
      • Versatile Diffusion
      • VQ Diffusion
      • Wuerstchen
    • 🌍SCHEDULERS
      • Overview
      • CMStochasticIterativeScheduler
      • DDIMInverseScheduler
      • DDIMScheduler
      • DDPMScheduler
      • DEISMultistepScheduler
      • DPMSolverMultistepInverse
      • DPMSolverMultistepScheduler
      • DPMSolverSDEScheduler
      • DPMSolverSinglestepScheduler
      • EulerAncestralDiscreteScheduler
      • EulerDiscreteScheduler
      • HeunDiscreteScheduler
      • IPNDMScheduler
      • KarrasVeScheduler
      • KDPM2AncestralDiscreteScheduler
      • KDPM2DiscreteScheduler
      • LMSDiscreteScheduler
      • PNDMScheduler
      • RePaintScheduler
      • ScoreSdeVeScheduler
      • ScoreSdeVpScheduler
      • UniPCMultistepScheduler
      • VQDiffusionScheduler
Powered by GitBook
On this page
  • LDMTextToImagePipeline
  • LDMSuperResolutionPipeline
  • ImagePipelineOutput
  1. API
  2. PIPELINES

Latent Diffusionge

PreviousKandinsky 2.2NextMultiDiffusion

Last updated 1 year ago

LDMTextToImagePipeline

class diffusers.LDMTextToImagePipeline

( vqvae: typing.Union[diffusers.models.vq_model.VQModel, diffusers.models.autoencoder_kl.AutoencoderKL]bert: PreTrainedModeltokenizer: PreTrainedTokenizerunet: typing.Union[diffusers.models.unet_2d.UNet2DModel, diffusers.models.unet_2d_condition.UNet2DConditionModel]scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler] )

Parameters

  • vqvae () — Vector-quantized (VQ) model to encode and decode images to and from latent representations.

  • bert (LDMBertModel) — Text-encoder model based on BERT.

  • tokenizer (BertTokenizer) — A BertTokenizer to tokenize text.

  • unet () — A UNet2DConditionModel to denoise the encoded image latents.

  • scheduler () — A scheduler to be used in combination with unet to denoise the encoded image latents. Can be one of , , or .

Pipeline for text-to-image generation using latent diffusion.

This model inherits from . Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).

__call__

( prompt: typing.Union[str, typing.List[str]]height: typing.Optional[int] = Nonewidth: typing.Optional[int] = Nonenum_inference_steps: typing.Optional[int] = 50guidance_scale: typing.Optional[float] = 1.0eta: typing.Optional[float] = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Nonelatents: typing.Optional[torch.FloatTensor] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = True**kwargs ) → or tuple

Parameters

  • prompt (str or List[str]) — The prompt or prompts to guide the image generation.

  • height (int, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor) — The height in pixels of the generated image.

  • width (int, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor) — The width in pixels of the generated image.

  • num_inference_steps (int, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

  • guidance_scale (float, optional, defaults to 1.0) — A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. Guidance scale is enabled when guidance_scale > 1.

  • latents (torch.FloatTensor, optional) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied random generator.

  • output_type (str, optional, defaults to "pil") — The output format of the generated image. Choose between PIL.Image or np.array.

Returns

The call function to the pipeline for generation.

Example:

Copied

>>> from diffusers import DiffusionPipeline

>>> # load model and scheduler
>>> ldm = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256")

>>> # run pipeline in inference (sample random noise and denoise)
>>> prompt = "A painting of a squirrel eating a burger"
>>> images = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6).images

>>> # save images
>>> for idx, image in enumerate(images):
...     image.save(f"squirrel-{idx}.png")

LDMSuperResolutionPipeline

class diffusers.LDMSuperResolutionPipeline

( vqvae: VQModelunet: UNet2DModelscheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler, diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler, diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler, diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler] )

Parameters

A pipeline for image super-resolution using latent diffusion.

__call__

Parameters

  • image (torch.Tensor or PIL.Image.Image) — Image or tensor representing an image batch to be used as the starting point for the process.

  • batch_size (int, optional, defaults to 1) — Number of images to generate.

  • num_inference_steps (int, optional, defaults to 100) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

  • output_type (str, optional, defaults to "pil") — The output format of the generated image. Choose between PIL.Image or np.array.

Returns

The call function to the pipeline for generation.

Example:

Copied

>>> import requests
>>> from PIL import Image
>>> from io import BytesIO
>>> from diffusers import LDMSuperResolutionPipeline
>>> import torch

>>> # load model and scheduler
>>> pipeline = LDMSuperResolutionPipeline.from_pretrained("CompVis/ldm-super-resolution-4x-openimages")
>>> pipeline = pipeline.to("cuda")

>>> # let's download an  image
>>> url = (
...     "https://user-images.githubusercontent.com/38061659/199705896-b48e17b8-b231-47cd-a270-4ffa5a93fa3e.png"
... )
>>> response = requests.get(url)
>>> low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
>>> low_res_img = low_res_img.resize((128, 128))

>>> # run pipeline in inference (sample random noise and denoise)
>>> upscaled_image = pipeline(low_res_img, num_inference_steps=100, eta=1).images[0]
>>> # save image
>>> upscaled_image.save("ldm_generated_image.png")

ImagePipelineOutput

class diffusers.ImagePipelineOutput

( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] )

Parameters

  • images (List[PIL.Image.Image] or np.ndarray) — List of denoised PIL images of length batch_size or NumPy array of shape (batch_size, height, width, num_channels).

Output class for image pipelines.

generator (torch.Generator, optional) — A to make generation deterministic.

return_dict (bool, optional, defaults to True) — Whether or not to return a instead of a plain tuple.

or tuple

If return_dict is True, is returned, otherwise a tuple is returned where the first element is a list with the generated images.

vqvae () — Vector-quantized (VQ) model to encode and decode images to and from latent representations.

unet () — A UNet2DModel to denoise the encoded image.

scheduler () — A scheduler to be used in combination with unet to denoise the encoded image latens. Can be one of , , , , , or .

This model inherits from . Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).

( image: typing.Union[torch.Tensor, PIL.Image.Image] = Nonebatch_size: typing.Optional[int] = 1num_inference_steps: typing.Optional[int] = 100eta: typing.Optional[float] = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = True ) → or tuple

eta (float, optional, defaults to 0.0) — Corresponds to parameter eta (η) from the paper. Only applies to the , and is ignored in other schedulers.

generator (torch.Generator or List[torch.Generator], optional) — A to make generation deterministic.

return_dict (bool, optional, defaults to True) — Whether or not to return a instead of a plain tuple.

or tuple

If return_dict is True, is returned, otherwise a tuple is returned where the first element is a list with the generated images

🌍
🌍
<source>
VQModel
UNet2DConditionModel
SchedulerMixin
DDIMScheduler
LMSDiscreteScheduler
PNDMScheduler
DiffusionPipeline
<source>
ImagePipelineOutput
torch.Generator
ImagePipelineOutput
ImagePipelineOutput
ImagePipelineOutput
<source>
VQModel
UNet2DModel
SchedulerMixin
DDIMScheduler
LMSDiscreteScheduler
EulerDiscreteScheduler
EulerAncestralDiscreteScheduler
DPMSolverMultistepScheduler
PNDMScheduler
DiffusionPipeline
<source>
ImagePipelineOutput
DDIM
DDIMScheduler
torch.Generator
ImagePipelineOutput
ImagePipelineOutput
ImagePipelineOutput
<source>