# Gaudi Stable Diffusion Pipeline

## GaudiStableDiffusionPipeline

The `GaudiStableDiffusionPipeline` class enables to perform text-to-image generation on HPUs. It inherits from the `GaudiDiffusionPipeline` class that is the parent to any kind of diffuser pipeline.

To get the most out of it, it should be associated with a scheduler that is optimized for HPUs like `GaudiDDIMScheduler`.

### GaudiStableDiffusionPipeline

#### class optimum.habana.diffusers.GaudiStableDiffusionPipeline

[\<source>](https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L67)

( vae: AutoencoderKL text\_encoder: CLIPTextModel tokenizer: CLIPTokenizer unet: UNet2DConditionModel scheduler: KarrasDiffusionSchedulers safety\_checker: StableDiffusionSafetyChecker feature\_extractor: CLIPImageProcessor requires\_safety\_checker: bool = True use\_habana: bool = False use\_hpu\_graphs: bool = False gaudi\_config: typing.Union\[str, optimum.habana.transformers.gaudi\_configuration.GaudiConfig] = None bf16\_full\_eval: bool = False )

Parameters

* **vae** (`AutoencoderKL`) — Variational Auto-Encoder (VAE) model to encode and decode images to and from latent representations.
* **text\_encoder** ([CLIPTextModel](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPTextModel)) — Frozen text-encoder ([clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14)).
* **tokenizer** (`~transformers.CLIPTokenizer`) — A `CLIPTokenizer` to tokenize text.
* **unet** (`UNet2DConditionModel`) — A `UNet2DConditionModel` to denoise the encoded image latents.
* **scheduler** (`SchedulerMixin`) — A scheduler to be used in combination with `unet` to denoise the encoded image latents. Can be one of `DDIMScheduler`, `LMSDiscreteScheduler`, or `PNDMScheduler`.
* **safety\_checker** (`StableDiffusionSafetyChecker`) — Classification module that estimates whether generated images could be considered offensive or harmful. Please refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for more details about a model’s potential harms.
* **feature\_extractor** ([CLIPImageProcessor](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPImageProcessor)) — A `CLIPImageProcessor` to extract features from generated images; used as inputs to the `safety_checker`.
* **use\_habana** (bool, defaults to `False`) — Whether to use Gaudi (`True`) or CPU (`False`).
* **use\_hpu\_graphs** (bool, defaults to `False`) — Whether to use HPU graphs or not.
* **gaudi\_config** (Union\[str, [GaudiConfig](https://huggingface.co/docs/optimum.habana/main/en/package_reference/gaudi_config#optimum.habana.GaudiConfig)], defaults to `None`) — Gaudi configuration to use. Can be a string to download it from the Hub. Or a previously initialized config can be passed.
* **bf16\_full\_eval** (bool, defaults to `False`) — Whether to use full bfloat16 evaluation instead of 32-bit. This will be faster and save memory compared to fp32/mixed precision but can harm generated images.

Extends the [`StableDiffusionPipeline`](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion#diffusers.StableDiffusionPipeline) class:

* Generation is performed by batches
* Two `mark_step()` were added to add support for lazy mode
* Added support for HPU graphs

**\_\_call\_\_**

[\<source>](https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L569)

( prompt: typing.Union\[str, typing.List\[str]] = None height: typing.Optional\[int] = None width: typing.Optional\[int] = None num\_inference\_steps: int = 50 guidance\_scale: float = 7.5 negative\_prompt: typing.Union\[typing.List\[str], str, NoneType] = None num\_images\_per\_prompt: typing.Optional\[int] = 1 batch\_size: int = 1 eta: float = 0.0 generator: typing.Union\[torch.\_C.Generator, typing.List\[torch.\_C.Generator], NoneType] = None latents: typing.Optional\[torch.FloatTensor] = None prompt\_embeds: typing.Optional\[torch.FloatTensor] = None negative\_prompt\_embeds: typing.Optional\[torch.FloatTensor] = None output\_type: typing.Optional\[str] = 'pil' return\_dict: bool = True callback: typing.Union\[typing.Callable\[\[int, int, torch.FloatTensor], NoneType], NoneType] = None callback\_steps: int = 1 cross\_attention\_kwargs: typing.Union\[typing.Dict\[str, typing.Any], NoneType] = None guidance\_rescale: float = 0.0 ) → `GaudiStableDiffusionPipelineOutput` or `tuple`

Parameters

* **prompt** (`str` or `List[str]`, *optional*) — The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.
* **height** (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`) — The height in pixels of the generated images.
* **width** (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`) — The width in pixels of the generated images.
* **num\_inference\_steps** (`int`, *optional*, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
* **guidance\_scale** (`float`, *optional*, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the text `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
* **negative\_prompt** (`str` or `List[str]`, *optional*) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale < 1`).
* **num\_images\_per\_prompt** (`int`, *optional*, defaults to 1) — The number of images to generate per prompt.
* **batch\_size** (`int`, *optional*, defaults to 1) — The number of images in a batch.
* **eta** (`float`, *optional*, defaults to 0.0) — Corresponds to parameter eta (η) from the [DDIM](https://arxiv.org/abs/2010.02502) paper. Only applies to the `~schedulers.DDIMScheduler`, and is ignored in other schedulers.
* **generator** (`torch.Generator` or `List[torch.Generator]`, *optional*) — A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make generation deterministic.
* **latents** (`torch.FloatTensor`, *optional*) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied random `generator`.
* **prompt\_embeds** (`torch.FloatTensor`, *optional*) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from the `prompt` input argument.
* **negative\_prompt\_embeds** (`torch.FloatTensor`, *optional*) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
* **output\_type** (`str`, *optional*, defaults to `"pil"`) — The output format of the generated image. Choose between `PIL.Image` or `np.array`.
* **return\_dict** (`bool`, *optional*, defaults to `True`) — Whether or not to return a `GaudiStableDiffusionPipelineOutput` instead of a plain tuple.
* **callback** (`Callable`, *optional*) — A function that calls every `callback_steps` steps during inference. The function is called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
* **callback\_steps** (`int`, *optional*, defaults to 1) — The frequency at which the `callback` function is called. If not specified, the callback is called at every step.
* **cross\_attention\_kwargs** (`dict`, *optional*) — A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in [`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
* **guidance\_rescale** (`float`, *optional*, defaults to 0.7) — Guidance rescale factor from [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). Guidance rescale factor should fix overexposure when using zero terminal SNR.

Returns

`GaudiStableDiffusionPipelineOutput` or `tuple`

If `return_dict` is `True`, `~diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` is returned, otherwise a `tuple` is returned where the first element is a list with the generated images and the second element is a list of `bool`s indicating whether the corresponding generated image contains “not-safe-for-work” (nsfw) content.

The call function to the pipeline for generation.

### GaudiDiffusionPipeline

#### class optimum.habana.diffusers.GaudiDiffusionPipeline

[\<source>](https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/pipeline_utils.py#L63)

( use\_habana: bool = False use\_hpu\_graphs: bool = False gaudi\_config: typing.Union\[str, optimum.habana.transformers.gaudi\_configuration.GaudiConfig] = None bf16\_full\_eval: bool = False )

Parameters

* **use\_habana** (bool, defaults to `False`) — Whether to use Gaudi (`True`) or CPU (`False`).
* **use\_hpu\_graphs** (bool, defaults to `False`) — Whether to use HPU graphs or not.
* **gaudi\_config** (Union\[str, [GaudiConfig](https://huggingface.co/docs/optimum.habana/main/en/package_reference/gaudi_config#optimum.habana.GaudiConfig)], defaults to `None`) — Gaudi configuration to use. Can be a string to download it from the Hub. Or a previously initialized config can be passed.
* **bf16\_full\_eval** (bool, defaults to `False`) — Whether to use full bfloat16 evaluation instead of 32-bit. This will be faster and save memory compared to fp32/mixed precision but can harm generated images.

Extends the [`DiffusionPipeline`](https://huggingface.co/docs/diffusers/api/diffusion_pipeline) class:

* The pipeline is initialized on Gaudi if `use_habana=True`.
* The pipeline’s Gaudi configuration is saved and pushed to the hub.

**from\_pretrained**

[\<source>](https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/pipeline_utils.py#L352)

( pretrained\_model\_name\_or\_path: typing.Union\[str, os.PathLike, NoneType] \*\*kwargs )

More information [here](https://huggingface.co/docs/diffusers/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained).

**save\_pretrained**

[\<source>](https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/pipeline_utils.py#L237)

( save\_directory: typing.Union\[str, os.PathLike] safe\_serialization: bool = True variant: typing.Optional\[str] = None push\_to\_hub: bool = False \*\*kwargs )

Parameters

* **save\_directory** (`str` or `os.PathLike`) — Directory to which to save. Will be created if it doesn’t exist.
* **safe\_serialization** (`bool`, *optional*, defaults to `True`) — Whether to save the model using `safetensors` or the traditional PyTorch way (that uses `pickle`).
* **variant** (`str`, *optional*) — If specified, weights are saved in the format pytorch\_model..bin.
* **push\_to\_hub** (`bool`, *optional*, defaults to `False`) — Whether or not to push your model to the BOINC AI model hub after saving it. You can specify the repository you want to push to with `repo_id` (will default to the name of `save_directory` in your namespace).
* **kwargs** (`Dict[str, Any]`, *optional*) — Additional keyword arguments passed along to the `~utils.PushToHubMixin.push_to_hub` method.

Save the pipeline and Gaudi configurations. More information [here](https://huggingface.co/docs/diffusers/api/diffusion_pipeline#diffusers.DiffusionPipeline.save_pretrained).

### GaudiDDIMScheduler

#### class optimum.habana.diffusers.GaudiDDIMScheduler

[\<source>](https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/schedulers/scheduling_ddim.py#L28)

( num\_train\_timesteps: int = 1000 beta\_start: float = 0.0001 beta\_end: float = 0.02 beta\_schedule: str = 'linear' trained\_betas: typing.Union\[numpy.ndarray, typing.List\[float], NoneType] = None clip\_sample: bool = True set\_alpha\_to\_one: bool = True steps\_offset: int = 0 prediction\_type: str = 'epsilon' thresholding: bool = False dynamic\_thresholding\_ratio: float = 0.995 clip\_sample\_range: float = 1.0 sample\_max\_value: float = 1.0 timestep\_spacing: str = 'leading' rescale\_betas\_zero\_snr: bool = False )

Parameters

* **num\_train\_timesteps** (`int`, defaults to 1000) — The number of diffusion steps to train the model.
* **beta\_start** (`float`, defaults to 0.0001) — The starting `beta` value of inference.
* **beta\_end** (`float`, defaults to 0.02) — The final `beta` value.
* **beta\_schedule** (`str`, defaults to `"linear"`) — The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from `linear`, `scaled_linear`, or `squaredcos_cap_v2`.
* **trained\_betas** (`np.ndarray`, *optional*) — Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.
* **clip\_sample** (`bool`, defaults to `True`) — Clip the predicted sample for numerical stability.
* **clip\_sample\_range** (`float`, defaults to 1.0) — The maximum magnitude for sample clipping. Valid only when `clip_sample=True`.
* **set\_alpha\_to\_one** (`bool`, defaults to `True`) — Each diffusion step uses the alphas product value at that step and at the previous one. For the final step there is no previous alpha. When this option is `True` the previous alpha product is fixed to `1`, otherwise it uses the alpha value at step 0.
* **steps\_offset** (`int`, defaults to 0) — An offset added to the inference steps. You can use a combination of `offset=1` and `set_alpha_to_one=False` to make the last step use step 0 for the previous alpha product like in Stable Diffusion.
* **prediction\_type** (`str`, defaults to `epsilon`, *optional*) — Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process), `sample` (directly predicts the noisy sample`) or` v\_prediction\` (see section 2.4 of [Imagen Video](https://imagen.research.google/video/paper.pdf) paper).
* **thresholding** (`bool`, defaults to `False`) — Whether to use the “dynamic thresholding” method. This is unsuitable for latent-space diffusion models such as Stable Diffusion.
* **dynamic\_thresholding\_ratio** (`float`, defaults to 0.995) — The ratio for the dynamic thresholding method. Valid only when `thresholding=True`.
* **sample\_max\_value** (`float`, defaults to 1.0) — The threshold value for dynamic thresholding. Valid only when `thresholding=True`.
* **timestep\_spacing** (`str`, defaults to `"leading"`) — The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
* **rescale\_betas\_zero\_snr** (`bool`, defaults to `False`) — Whether to rescale the betas to have zero terminal SNR. This enables the model to generate very bright and dark samples instead of limiting it to samples with medium brightness. Loosely related to [`--offset_noise`](https://github.com/huggingface/diffusers/blob/74fd735eb073eb1d774b1ab4154a0876eb82f055/examples/dreambooth/train_dreambooth.py#L506).

Extends [Diffusers’ DDIMScheduler](https://huggingface.co/docs/diffusers/api/schedulers#diffusers.DDIMScheduler) to run optimally on Gaudi:

* All time-dependent parameters are generated at the beginning
* At each time step, tensors are rolled to update the values of the time-dependent parameters

**step**

[\<source>](https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/schedulers/scheduling_ddim.py#L170)

( model\_output: FloatTensor sample: FloatTensor eta: float = 0.0 use\_clipped\_model\_output: bool = False generator = None variance\_noise: typing.Optional\[torch.FloatTensor] = None return\_dict: bool = True ) → `diffusers.schedulers.scheduling_utils.DDIMSchedulerOutput` or `tuple`

Parameters

* **model\_output** (`torch.FloatTensor`) — The direct output from learned diffusion model.
* **timestep** (`float`) — The current discrete timestep in the diffusion chain.
* **sample** (`torch.FloatTensor`) — A current instance of a sample created by the diffusion process.
* **eta** (`float`) — The weight of noise for added noise in diffusion step.
* **use\_clipped\_model\_output** (`bool`, defaults to `False`) — If `True`, computes “corrected” `model_output` from the clipped predicted original sample. Necessary because predicted original sample is clipped to \[-1, 1] when `self.config.clip_sample` is `True`. If no clipping has happened, “corrected” `model_output` would coincide with the one provided as input and `use_clipped_model_output` has no effect.
* **generator** (`torch.Generator`, *optional*) — A random number generator.
* **variance\_noise** (`torch.FloatTensor`) — Alternative to generating noise with `generator` by directly providing the noise for the variance itself. Useful for methods such as `CycleDiffusion`.
* **return\_dict** (`bool`, *optional*, defaults to `True`) — Whether or not to return a `DDIMSchedulerOutput` or `tuple`.

Returns

`diffusers.schedulers.scheduling_utils.DDIMSchedulerOutput` or `tuple`

If return\_dict is `True`, `DDIMSchedulerOutput` is returned, otherwise a tuple is returned where the first element is the sample tensor.

Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion process from the learned model outputs (most often the predicted noise).

## GaudiStableDiffusionUpscalePipeline

The `GaudiStableDiffusionUpscalePipeline` is used to enhance the resolution of input images by a factor of 4 on HPUs. It inherits from the `GaudiDiffusionPipeline` class that is the parent to any kind of diffuser pipeline.

#### class optimum.habana.diffusers.GaudiStableDiffusionUpscalePipeline

[\<source>](https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py#L92)

( vae: AutoencoderKL text\_encoder: CLIPTextModel tokenizer: CLIPTokenizer unet: UNet2DConditionModel low\_res\_scheduler: DDPMScheduler scheduler: KarrasDiffusionSchedulers safety\_checker: typing.Optional\[typing.Any] = None feature\_extractor: typing.Optional\[transformers.models.clip.image\_processing\_clip.CLIPImageProcessor] = None use\_habana: bool = False use\_hpu\_graphs: bool = False gaudi\_config: typing.Union\[str, optimum.habana.transformers.gaudi\_configuration.GaudiConfig] = None bf16\_full\_eval: bool = False watermarker: typing.Optional\[typing.Any] = None max\_noise\_level: int = 350 )

Parameters

* **vae** (`AutoencoderKL`) — Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.
* **text\_encoder** (`CLIPTextModel`) — Frozen text-encoder. Stable Diffusion uses the text portion of [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModel), specifically the [clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) variant.
* **tokenizer** (`CLIPTokenizer`) — Tokenizer of class [CLIPTokenizer](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer).
* **unet** (`UNet2DConditionModel`) — Conditional U-Net architecture to denoise the encoded image latents.
* **low\_res\_scheduler** (`SchedulerMixin`) — A scheduler used to add initial noise to the low resolution conditioning image. It must be an instance of `DDPMScheduler`.
* **scheduler** (`SchedulerMixin`) — A scheduler to be used in combination with `unet` to denoise the encoded image latents. Can be one of `DDIMScheduler`, `LMSDiscreteScheduler`, or `PNDMScheduler`.
* **safety\_checker** (`StableDiffusionSafetyChecker`) — Classification module that estimates whether generated images could be considered offensive or harmful. Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details.
* **feature\_extractor** (`CLIPImageProcessor`) — Model that extracts features from generated images to be used as inputs for the `safety_checker`.
* **use\_habana** (bool, defaults to `False`) — Whether to use Gaudi (`True`) or CPU (`False`).
* **use\_hpu\_graphs** (bool, defaults to `False`) — Whether to use HPU graphs or not.
* **gaudi\_config** (Union\[str, [GaudiConfig](https://huggingface.co/docs/optimum.habana/main/en/package_reference/gaudi_config#optimum.habana.GaudiConfig)], defaults to `None`) — Gaudi configuration to use. Can be a string to download it from the Hub. Or a previously initialized config can be passed.
* **bf16\_full\_eval** (bool, defaults to `False`) — Whether to use full bfloat16 evaluation instead of 32-bit. This will be faster and save memory compared to fp32/mixed precision but can harm generated images.

Pipeline for text-guided image super-resolution using Stable Diffusion 2.

Extends the [`StableDiffusionUpscalePipeline`](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion#diffusers.StableDiffusionUpscalePipeline) class:

* Generation is performed by batches
* Two `mark_step()` were added to add support for lazy mode
* Added support for HPU graphs

**\_\_call\_\_**

[\<source>](https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py#L579)

( prompt: typing.Union\[str, typing.List\[str]] = None image: typing.Union\[PIL.Image.Image, numpy.ndarray, torch.FloatTensor, typing.List\[PIL.Image.Image], typing.List\[numpy.ndarray], typing.List\[torch.FloatTensor]] = None num\_inference\_steps: int = 75 guidance\_scale: float = 9.0 noise\_level: int = 20 negative\_prompt: typing.Union\[typing.List\[str], str, NoneType] = None num\_images\_per\_prompt: typing.Optional\[int] = 1 batch\_size: int = 1 eta: float = 0.0 generator: typing.Union\[torch.\_C.Generator, typing.List\[torch.\_C.Generator], NoneType] = None latents: typing.Optional\[torch.FloatTensor] = None prompt\_embeds: typing.Optional\[torch.FloatTensor] = None negative\_prompt\_embeds: typing.Optional\[torch.FloatTensor] = None output\_type: typing.Optional\[str] = 'pil' return\_dict: bool = True callback: typing.Union\[typing.Callable\[\[int, int, torch.FloatTensor], NoneType], NoneType] = None callback\_steps: int = 1 cross\_attention\_kwargs: typing.Union\[typing.Dict\[str, typing.Any], NoneType] = None ) → `GaudiStableDiffusionPipelineOutput` or `tuple`

Parameters

* **prompt** (`str` or `List[str]`, *optional*) — The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`. instead.
* **image** (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`) — `Image` or tensor representing an image batch to be upscaled.
* **num\_inference\_steps** (`int`, *optional*, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
* **guidance\_scale** (`float`, *optional*, defaults to 7.5) — Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598). `guidance_scale` is defined as `w` of equation 2. of [Imagen Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`, usually at the expense of lower image quality.
* **negative\_prompt** (`str` or `List[str]`, *optional*) — The prompt or prompts not to guide the image generation. If not defined, one has to pass `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`).
* **num\_images\_per\_prompt** (`int`, *optional*, defaults to 1) — The number of images to generate per prompt.
* **batch\_size** (`int`, *optional*, defaults to 1) — The number of images in a batch.
* **eta** (`float`, *optional*, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: <https://arxiv.org/abs/2010.02502>. Only applies to `schedulers.DDIMScheduler`, will be ignored for others.
* **generator** (`torch.Generator` or `List[torch.Generator]`, *optional*) — One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make generation deterministic.
* **latents** (`torch.FloatTensor`, *optional*) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated randomly.
* **prompt\_embeds** (`torch.FloatTensor`, *optional*) — Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, text embeddings will be generated from `prompt` input argument.
* **negative\_prompt\_embeds** (`torch.FloatTensor`, *optional*) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, negative\_prompt\_embeds will be generated from `negative_prompt` input argument.
* **output\_type** (`str`, *optional*, defaults to `"pil"`) — The output format of the generate image. Choose between [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
* **return\_dict** (`bool`, *optional*, defaults to `True`) — Whether or not to return a `GaudiStableDiffusionPipelineOutput` instead of a plain tuple.
* **callback** (`Callable`, *optional*) — A function that will be called every `callback_steps` steps during inference. The function will be called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
* **callback\_steps** (`int`, *optional*, defaults to 1) — The frequency at which the `callback` function will be called. If not specified, the callback will be called at every step.
* **cross\_attention\_kwargs** (`dict`, *optional*) — A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under `self.processor` in [diffusers.cross\_attention](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py).

Returns

`GaudiStableDiffusionPipelineOutput` or `tuple`

`GaudiStableDiffusionPipelineOutput` if `return_dict` is True, otherwise a `tuple`. When returning a tuple, the first element is a list with the generated images, and the second element is a list of `bool`s denoting whether the corresponding generated image likely represents “not-safe-for-work” (nsfw) content, according to the `safety_checker`.

Function invoked when calling the pipeline for generation.

Examples:

Copied

```
>>> import requests   #TODO to test?
>>> from PIL import Image
>>> from io import BytesIO
>>> from optimum.habana.diffusers import GaudiStableDiffusionUpscalePipeline
>>> import torch

>>> # load model and scheduler
>>> model_id = "stabilityai/stable-diffusion-x4-upscaler"
>>> pipeline = GaudiStableDiffusionUpscalePipeline.from_pretrained(
...     model_id, revision="fp16", torch_dtype=torch.bfloat16
... )
>>> pipeline = pipeline.to("cuda")

>>> # let's download an  image
>>> url = "https://boincai.com/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png"
>>> response = requests.get(url)
>>> low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
>>> low_res_img = low_res_img.resize((128, 128))
>>> prompt = "a white cat"

>>> upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
>>> upscaled_image.save("upsampled_cat.png")
```
