# Cycle Diffusion

## Cycle Diffusion

Cycle Diffusion is a text guided image-to-image generation model proposed in [Unifying Diffusion Models’ Latent Space, with Applications to CycleDiffusion and Guidance](https://huggingface.co/papers/2210.05559) by Chen Henry Wu, Fernando De la Torre.

The abstract from the paper is:

*Diffusion models have achieved unprecedented performance in generative modeling. The commonly-adopted formulation of the latent code of diffusion models is a sequence of gradually denoised samples, as opposed to the simpler (e.g., Gaussian) latent space of GANs, VAEs, and normalizing flows. This paper provides an alternative, Gaussian formulation of the latent space of various diffusion models, as well as an invertible DPM-Encoder that maps images into the latent space. While our formulation is purely based on the definition of diffusion models, we demonstrate several intriguing consequences. (1) Empirically, we observe that a common latent space emerges from two diffusion models trained independently on related domains. In light of this finding, we propose CycleDiffusion, which uses DPM-Encoder for unpaired image-to-image translation. Furthermore, applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors. (2) One can guide pre-trained diffusion models and GANs by controlling the latent codes in a unified, plug-and-play formulation based on energy-based models. Using the CLIP model and a face recognition model as guidance, we demonstrate that diffusion models have better coverage of low-density sub-populations and individuals than GANs.*

Make sure to check out the Schedulers [guide](https://huggingface.co/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](https://huggingface.co/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.

### CycleDiffusionPipeline

#### class diffusers.CycleDiffusionPipeline

[\<source>](https://github.com/huggingface/diffusers/blob/v0.21.0/src/diffusers/pipelines/stable_diffusion/pipeline_cycle_diffusion.py#L125)

( vae: AutoencoderKLtext\_encoder: CLIPTextModeltokenizer: CLIPTokenizerunet: UNet2DConditionModelscheduler: DDIMSchedulersafety\_checker: StableDiffusionSafetyCheckerfeature\_extractor: CLIPImageProcessorrequires\_safety\_checker: bool = True )

Parameters

* **vae** ([AutoencoderKL](https://huggingface.co/docs/diffusers/v0.21.0/en/api/models/autoencoderkl#diffusers.AutoencoderKL)) — Variational Auto-Encoder (VAE) model to encode and decode images to and from latent representations.
* **text\_encoder** (`CLIPTextModel`) — Frozen text-encoder ([clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14)).
* **tokenizer** (`CLIPTokenizer`) — A `CLIPTokenizer` to tokenize text.
* **unet** ([UNet2DConditionModel](https://huggingface.co/docs/diffusers/v0.21.0/en/api/models/unet2d-cond#diffusers.UNet2DConditionModel)) — A `UNet2DConditionModel` to denoise the encoded image latents.
* **scheduler** ([SchedulerMixin](https://huggingface.co/docs/diffusers/v0.21.0/en/api/schedulers/overview#diffusers.SchedulerMixin)) — A scheduler to be used in combination with `unet` to denoise the encoded image latents. Can only be an instance of [DDIMScheduler](https://huggingface.co/docs/diffusers/v0.21.0/en/api/schedulers/ddim#diffusers.DDIMScheduler).
* **safety\_checker** (`StableDiffusionSafetyChecker`) — Classification module that estimates whether generated images could be considered offensive or harmful. Please refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for more details about a model’s potential harms.
* **feature\_extractor** (`CLIPImageProcessor`) — A `CLIPImageProcessor` to extract features from generated images; used as inputs to the `safety_checker`.

Pipeline for text-guided image to image generation using Stable Diffusion.

This model inherits from [DiffusionPipeline](https://huggingface.co/docs/diffusers/v0.21.0/en/api/pipelines/overview#diffusers.DiffusionPipeline). Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).

**\_\_call\_\_**

[\<source>](https://github.com/huggingface/diffusers/blob/v0.21.0/src/diffusers/pipelines/stable_diffusion/pipeline_cycle_diffusion.py#L571)

( prompt: typing.Union\[str, typing.List\[str]]source\_prompt: typing.Union\[str, typing.List\[str]]image: typing.Union\[PIL.Image.Image, numpy.ndarray, torch.FloatTensor, typing.List\[PIL.Image.Image], typing.List\[numpy.ndarray], typing.List\[torch.FloatTensor]] = Nonestrength: float = 0.8num\_inference\_steps: typing.Optional\[int] = 50guidance\_scale: typing.Optional\[float] = 7.5source\_guidance\_scale: typing.Optional\[float] = 1num\_images\_per\_prompt: typing.Optional\[int] = 1eta: typing.Optional\[float] = 0.1generator: typing.Union\[torch.\_C.Generator, typing.List\[torch.\_C.Generator], NoneType] = Noneprompt\_embeds: typing.Optional\[torch.FloatTensor] = Noneoutput\_type: typing.Optional\[str] = 'pil'return\_dict: bool = Truecallback: typing.Union\[typing.Callable\[\[int, int, torch.FloatTensor], NoneType], NoneType] = Nonecallback\_steps: int = 1cross\_attention\_kwargs: typing.Union\[typing.Dict\[str, typing.Any], NoneType] = None ) → [StableDiffusionPipelineOutput](https://huggingface.co/docs/diffusers/v0.21.0/en/api/pipelines/stable_diffusion/img2img#diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput) or `tuple`

Parameters

* **prompt** (`str` or `List[str]`) — The prompt or prompts to guide the image generation.
* **image** (`torch.FloatTensor` `np.ndarray`, `PIL.Image.Image`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`) — `Image` or tensor representing an image batch to be used as the starting point. Can also accept image latents as `image`, but if passing latents directly it is not encoded again.
* **strength** (`float`, *optional*, defaults to 0.8) — Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a starting point and more noise is added the higher the `strength`. The number of denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 essentially ignores `image`.
* **num\_inference\_steps** (`int`, *optional*, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. This parameter is modulated by `strength`.
* **guidance\_scale** (`float`, *optional*, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the text `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
* **source\_guidance\_scale** (`float`, *optional*, defaults to 1) — Guidance scale for the source prompt. This is useful to control the amount of influence the source prompt has for encoding.
* **num\_images\_per\_prompt** (`int`, *optional*, defaults to 1) — The number of images to generate per prompt.
* **eta** (`float`, *optional*, defaults to 0.0) — Corresponds to parameter eta (η) from the [DDIM](https://arxiv.org/abs/2010.02502) paper. Only applies to the [DDIMScheduler](https://huggingface.co/docs/diffusers/v0.21.0/en/api/schedulers/ddim#diffusers.DDIMScheduler), and is ignored in other schedulers.
* **generator** (`torch.Generator` or `List[torch.Generator]`, *optional*) — A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make generation deterministic.
* **prompt\_embeds** (`torch.FloatTensor`, *optional*) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from the `prompt` input argument.
* **negative\_prompt\_embeds** (`torch.FloatTensor`, *optional*) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
* **output\_type** (`str`, *optional*, defaults to `"pil"`) — The output format of the generated image. Choose between `PIL.Image` or `np.array`.
* **return\_dict** (`bool`, *optional*, defaults to `True`) — Whether or not to return a [StableDiffusionPipelineOutput](https://huggingface.co/docs/diffusers/v0.21.0/en/api/pipelines/stable_diffusion/img2img#diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput) instead of a plain tuple.
* **callback** (`Callable`, *optional*) — A function that calls every `callback_steps` steps during inference. The function is called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
* **callback\_steps** (`int`, *optional*, defaults to 1) — The frequency at which the `callback` function is called. If not specified, the callback is called at every step.
* **cross\_attention\_kwargs** (`dict`, *optional*) — A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in [`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).

Returns

[StableDiffusionPipelineOutput](https://huggingface.co/docs/diffusers/v0.21.0/en/api/pipelines/stable_diffusion/img2img#diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput) or `tuple`

If `return_dict` is `True`, [StableDiffusionPipelineOutput](https://huggingface.co/docs/diffusers/v0.21.0/en/api/pipelines/stable_diffusion/img2img#diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput) is returned, otherwise a `tuple` is returned where the first element is a list with the generated images and the second element is a list of `bool`s indicating whether the corresponding generated image contains “not-safe-for-work” (nsfw) content.

The call function to the pipeline for generation.

Example:

Copied

```
import requests
import torch
from PIL import Image
from io import BytesIO

from diffusers import CycleDiffusionPipeline, DDIMScheduler

# load the pipeline
# make sure you're logged in with `boincai-cli login`
model_id_or_path = "CompVis/stable-diffusion-v1-4"
scheduler = DDIMScheduler.from_pretrained(model_id_or_path, subfolder="scheduler")
pipe = CycleDiffusionPipeline.from_pretrained(model_id_or_path, scheduler=scheduler).to("cuda")

# let's download an initial image
url = "https://raw.githubusercontent.com/ChenWu98/cycle-diffusion/main/data/dalle2/An%20astronaut%20riding%20a%20horse.png"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
init_image.save("horse.png")

# let's specify a prompt
source_prompt = "An astronaut riding a horse"
prompt = "An astronaut riding an elephant"

# call the pipeline
image = pipe(
    prompt=prompt,
    source_prompt=source_prompt,
    image=init_image,
    num_inference_steps=100,
    eta=0.1,
    strength=0.8,
    guidance_scale=2,
    source_guidance_scale=1,
).images[0]

image.save("horse_to_elephant.png")

# let's try another example
# See more samples at the original repo: https://github.com/ChenWu98/cycle-diffusion
url = (
    "https://raw.githubusercontent.com/ChenWu98/cycle-diffusion/main/data/dalle2/A%20black%20colored%20car.png"
)
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
init_image.save("black.png")

source_prompt = "A black colored car"
prompt = "A blue colored car"

# call the pipeline
torch.manual_seed(0)
image = pipe(
    prompt=prompt,
    source_prompt=source_prompt,
    image=init_image,
    num_inference_steps=100,
    eta=0.1,
    strength=0.85,
    guidance_scale=3,
    source_guidance_scale=1,
).images[0]

image.save("black_to_blue.png")
```

**encode\_prompt**

[\<source>](https://github.com/huggingface/diffusers/blob/v0.21.0/src/diffusers/pipelines/stable_diffusion/pipeline_cycle_diffusion.py#L262)

( promptdevicenum\_images\_per\_promptdo\_classifier\_free\_guidancenegative\_prompt = Noneprompt\_embeds: typing.Optional\[torch.FloatTensor] = Nonenegative\_prompt\_embeds: typing.Optional\[torch.FloatTensor] = Nonelora\_scale: typing.Optional\[float] = None )

Parameters

* **prompt** (`str` or `List[str]`, *optional*) — prompt to be encoded device — (`torch.device`): torch device
* **num\_images\_per\_prompt** (`int`) — number of images that should be generated per prompt
* **do\_classifier\_free\_guidance** (`bool`) — whether to use classifier free guidance or not
* **negative\_prompt** (`str` or `List[str]`, *optional*) — The prompt or prompts not to guide the image generation. If not defined, one has to pass `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`).
* **prompt\_embeds** (`torch.FloatTensor`, *optional*) — Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, text embeddings will be generated from `prompt` input argument.
* **negative\_prompt\_embeds** (`torch.FloatTensor`, *optional*) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, negative\_prompt\_embeds will be generated from `negative_prompt` input argument.
* **lora\_scale** (`float`, *optional*) — A lora scale that will be applied to all LoRA layers of the text encoder if LoRA layers are loaded.

Encodes the prompt into text encoder hidden states.

### StableDiffusionPiplineOutput

#### class diffusers.pipelines.stable\_diffusion.StableDiffusionPipelineOutput

[\<source>](https://github.com/huggingface/diffusers/blob/v0.21.0/src/diffusers/pipelines/stable_diffusion/pipeline_output.py#L15)

( images: typing.Union\[typing.List\[PIL.Image.Image], numpy.ndarray]nsfw\_content\_detected: typing.Optional\[typing.List\[bool]] )

Parameters

* **images** (`List[PIL.Image.Image]` or `np.ndarray`) — List of denoised PIL images of length `batch_size` or NumPy array of shape `(batch_size, height, width, num_channels)`.
* **nsfw\_content\_detected** (`List[bool]`) — List indicating whether the corresponding generated image contains “not-safe-for-work” (nsfw) content or `None` if safety checking could not be performed.

Output class for Stable Diffusion pipelines.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://boinc-ai.gitbook.io/diffusers/api/pipelines/cycle-diffusion.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
