Text-to-image
Last updated
Last updated
The Stable Diffusion model was created by researchers and engineers from , , , and . The is capable of generating photorealistic images given any text input. It’s trained on 512x512 images from a subset of the LAION-5B dataset. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and can run on consumer GPUs. Latent diffusion is the research on top of which Stable Diffusion was built. It was proposed in by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer.
The abstract from the paper is:
By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs. Code is available at .
Make sure to check out the Stable Diffusion section to learn how to explore the tradeoff between scheduler speed and quality, and how to reuse pipeline components efficiently!
If you’re interested in using one of the official checkpoints for a task, explore the , , and Hub organizations!
( vae: AutoencoderKLtext_encoder: CLIPTextModeltokenizer: CLIPTokenizerunet: UNet2DConditionModelscheduler: KarrasDiffusionSchedulerssafety_checker: StableDiffusionSafetyCheckerfeature_extractor: CLIPImageProcessorrequires_safety_checker: bool = True )
Parameters
vae () — Variational Auto-Encoder (VAE) model to encode and decode images to and from latent representations.
text_encoder (CLIPTextModel
) — Frozen text-encoder ().
tokenizer (CLIPTokenizer
) — A CLIPTokenizer
to tokenize text.
unet () — A UNet2DConditionModel
to denoise the encoded image latents.
scheduler () — A scheduler to be used in combination with unet
to denoise the encoded image latents. Can be one of , , or .
safety_checker (StableDiffusionSafetyChecker
) — Classification module that estimates whether generated images could be considered offensive or harmful. Please refer to the for more details about a model’s potential harms.
feature_extractor (CLIPImageProcessor
) — A CLIPImageProcessor
to extract features from generated images; used as inputs to the safety_checker
.
Pipeline for text-to-image generation using Stable Diffusion.
The pipeline also inherits the following loading methods:
__call__
Parameters
prompt (str
or List[str]
, optional) — The prompt or prompts to guide image generation. If not defined, you need to pass prompt_embeds
.
height (int
, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor
) — The height in pixels of the generated image.
width (int
, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor
) — The width in pixels of the generated image.
num_inference_steps (int
, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float
, optional, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the text prompt
at the expense of lower image quality. Guidance scale is enabled when guidance_scale > 1
.
negative_prompt (str
or List[str]
, optional) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass negative_prompt_embeds
instead. Ignored when not using guidance (guidance_scale < 1
).
num_images_per_prompt (int
, optional, defaults to 1) — The number of images to generate per prompt.
latents (torch.FloatTensor
, optional) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied random generator
.
prompt_embeds (torch.FloatTensor
, optional) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from the prompt
input argument.
negative_prompt_embeds (torch.FloatTensor
, optional) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, negative_prompt_embeds
are generated from the negative_prompt
input argument.
output_type (str
, optional, defaults to "pil"
) — The output format of the generated image. Choose between PIL.Image
or np.array
.
callback (Callable
, optional) — A function that calls every callback_steps
steps during inference. The function is called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor)
.
callback_steps (int
, optional, defaults to 1) — The frequency at which the callback
function is called. If not specified, the callback is called at every step.
Returns
The call function to the pipeline for generation.
Examples:
Copied
enable_attention_slicing
( slice_size: typing.Union[str, int, NoneType] = 'auto' )
Parameters
slice_size (str
or int
, optional, defaults to "auto"
) — When "auto"
, halves the input to the attention heads, so attention will be computed in two steps. If "max"
, maximum amount of memory will be saved by running only one slice at a time. If a number is provided, uses as many slices as attention_head_dim // slice_size
. In this case, attention_head_dim
must be a multiple of slice_size
.
Enable sliced attention computation. When this option is enabled, the attention module splits the input tensor in slices to compute attention in several steps. For more than one attention head, the computation is performed sequentially over each head. This is useful to save some memory in exchange for a small speed decrease.
⚠️ Don’t enable attention slicing if you’re already using scaled_dot_product_attention
(SDPA) from PyTorch 2.0 or xFormers. These attention computations are already very memory efficient so you won’t need to enable this function. If you enable attention slicing with SDPA or xFormers, it can lead to serious slow downs!
Examples:
Copied
disable_attention_slicing
( )
Disable sliced attention computation. If enable_attention_slicing
was previously called, attention is computed in one step.
enable_vae_slicing
( )
Enable sliced VAE decoding. When this option is enabled, the VAE will split the input tensor in slices to compute decoding in several steps. This is useful to save some memory and allow larger batch sizes.
disable_vae_slicing
( )
Disable sliced VAE decoding. If enable_vae_slicing
was previously enabled, this method will go back to computing decoding in one step.
enable_xformers_memory_efficient_attention
( attention_op: typing.Optional[typing.Callable] = None )
Parameters
⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient attention takes precedent.
Examples:
Copied
disable_xformers_memory_efficient_attention
( )
enable_vae_tiling
( )
Enable tiled VAE decoding. When this option is enabled, the VAE will split the input tensor into tiles to compute decoding and encoding in several steps. This is useful for saving a large amount of memory and to allow processing larger images.
disable_vae_tiling
( )
Disable tiled VAE decoding. If enable_vae_tiling
was previously enabled, this method will go back to computing decoding in one step.
load_textual_inversion
( pretrained_model_name_or_path: typing.Union[str, typing.List[str], typing.Dict[str, torch.Tensor], typing.List[typing.Dict[str, torch.Tensor]]]token: typing.Union[str, typing.List[str], NoneType] = Nonetokenizer: typing.Optional[transformers.tokenization_utils.PreTrainedTokenizer] = Nonetext_encoder: typing.Optional[transformers.modeling_utils.PreTrainedModel] = None**kwargs )
Parameters
pretrained_model_name_or_path (str
or os.PathLike
or List[str or os.PathLike]
or Dict
or List[Dict]
) — Can be either one of the following or a list of them:
A string, the model id (for example sd-concepts-library/low-poly-hd-logos-icons
) of a pretrained model hosted on the Hub.
A path to a directory (for example ./my_text_inversion_directory/
) containing the textual inversion weights.
A path to a file (for example ./my_text_inversions.pt
) containing textual inversion weights.
token (str
or List[str]
, optional) — Override the token to use for the textual inversion weights. If pretrained_model_name_or_path
is a list, then token
must also be a list of equal length.
tokenizer (CLIPTokenizer
, optional) — A CLIPTokenizer
to tokenize text. If not specified, function will take self.tokenizer.
weight_name (str
, optional) — Name of a custom weight file. This should be used when:
The saved textual inversion file is in 🌍 Diffusers format, but was saved under a specific weight name such as text_inv.bin
.
The saved textual inversion file is in the Automatic1111 format.
cache_dir (Union[str, os.PathLike]
, optional) — Path to a directory where a downloaded pretrained model configuration is cached if the standard cache is not used.
force_download (bool
, optional, defaults to False
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
resume_download (bool
, optional, defaults to False
) — Whether or not to resume downloading the model weights and configuration files. If set to False
, any incompletely downloaded files are deleted.
proxies (Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, for example, {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request.
local_files_only (bool
, optional, defaults to False
) — Whether to only load local model weights and configuration files or not. If set to True
, the model won’t be downloaded from the Hub.
use_auth_token (str
or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True
, the token generated from diffusers-cli login
(stored in ~/.boincai
) is used.
revision (str
, optional, defaults to "main"
) — The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier allowed by Git.
subfolder (str
, optional, defaults to ""
) — The subfolder location of a model file within a larger model repository on the Hub or locally.
mirror (str
, optional) — Mirror source to resolve accessibility issues if you’re downloading a model in China. We do not guarantee the timeliness or safety of the source, and you should refer to the mirror site for more information.
Example:
To load a textual inversion embedding vector in 🌍 Diffusers format:
Copied
locally:
Copied
from_single_file
( pretrained_model_link_or_path**kwargs )
Parameters
pretrained_model_link_or_path (str
or os.PathLike
, optional) — Can be either:
A link to the .ckpt
file (for example "https://boincai.co/<repo_id>/blob/main/<path_to_file>.ckpt"
) on the Hub.
A path to a file containing all pipeline weights.
torch_dtype (str
or torch.dtype
, optional) — Override the default torch.dtype
and load the model with another dtype. If "auto"
is passed, the dtype is automatically derived from the model’s weights.
force_download (bool
, optional, defaults to False
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
cache_dir (Union[str, os.PathLike]
, optional) — Path to a directory where a downloaded pretrained model configuration is cached if the standard cache is not used.
resume_download (bool
, optional, defaults to False
) — Whether or not to resume downloading the model weights and configuration files. If set to False
, any incompletely downloaded files are deleted.
proxies (Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, for example, {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request.
local_files_only (bool
, optional, defaults to False
) — Whether to only load local model weights and configuration files or not. If set to True
, the model won’t be downloaded from the Hub.
use_auth_token (str
or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True
, the token generated from diffusers-cli login
(stored in ~/.boincai
) is used.
revision (str
, optional, defaults to "main"
) — The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier allowed by Git.
use_safetensors (bool
, optional, defaults to None
) — If set to None
, the safetensors weights are downloaded if they’re available and if the safetensors library is installed. If set to True
, the model is forcibly loaded from safetensors weights. If set to False
, safetensors weights are not loaded.
extract_ema (bool
, optional, defaults to False
) — Whether to extract the EMA weights or not. Pass True
to extract the EMA weights which usually yield higher quality images for inference. Non-EMA weights are usually better for continuing finetuning.
upcast_attention (bool
, optional, defaults to None
) — Whether the attention computation should always be upcasted.
image_size (int
, optional, defaults to 512) — The image size the model was trained on. Use 512 for all Stable Diffusion v1 models and the Stable Diffusion v2 base model. Use 768 for Stable Diffusion v2.
prediction_type (str
, optional) — The prediction type the model was trained on. Use 'epsilon'
for all Stable Diffusion v1 models and the Stable Diffusion v2 base model. Use 'v_prediction'
for Stable Diffusion v2.
num_in_channels (int
, optional, defaults to None
) — The number of input channels. If None
, it is automatically inferred.
scheduler_type (str
, optional, defaults to "pndm"
) — Type of scheduler to use. Should be one of ["pndm", "lms", "heun", "euler", "euler-ancestral", "dpm", "ddim"]
.
load_safety_checker (bool
, optional, defaults to True
) — Whether to load the safety checker or not.
vae (AutoencoderKL
, optional, defaults to None
) — Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations. If this parameter is None
, the function will load a new instance of [CLIP] by itself, if needed.
tokenizer (CLIPTokenizer
, optional, defaults to None
) — An instance of CLIPTokenizer
to use. If this parameter is None
, the function loads a new instance of CLIPTokenizer
by itself if needed.
original_config_file (str
) — Path to .yaml
config file corresponding to the original architecture. If None
, will be automatically inferred by looking for a key that only exists in SD2.0 models.
kwargs (remaining dictionary of keyword arguments, optional) — Can be used to overwrite load and saveable variables (for example the pipeline components of the specific pipeline class). The overwritten components are directly passed to the pipelines __init__
method. See example below for more information.
Examples:
Copied
load_lora_weights
( pretrained_model_name_or_path_or_dict: typing.Union[str, typing.Dict[str, torch.Tensor]]**kwargs )
Parameters
Load LoRA weights specified in pretrained_model_name_or_path_or_dict
into self.unet
and self.text_encoder
.
All kwargs are forwarded to self.lora_state_dict
.
save_lora_weights
( save_directory: typing.Union[str, os.PathLike]unet_lora_layers: typing.Dict[str, typing.Union[torch.nn.modules.module.Module, torch.Tensor]] = Nonetext_encoder_lora_layers: typing.Dict[str, torch.nn.modules.module.Module] = Noneis_main_process: bool = Trueweight_name: str = Nonesave_function: typing.Callable = Nonesafe_serialization: bool = True )
Parameters
save_directory (str
or os.PathLike
) — Directory to save LoRA parameters to. Will be created if it doesn’t exist.
unet_lora_layers (Dict[str, torch.nn.Module]
or Dict[str, torch.Tensor]
) — State dict of the LoRA layers corresponding to the unet
.
text_encoder_lora_layers (Dict[str, torch.nn.Module]
or Dict[str, torch.Tensor]
) — State dict of the LoRA layers corresponding to the text_encoder
. Must explicitly pass the text encoder LoRA state dict because it comes from 🌍Transformers.
is_main_process (bool
, optional, defaults to True
) — Whether the process calling this is the main process or not. Useful during distributed training and you need to call this function on all processes. In this case, set is_main_process=True
only on the main process to avoid race conditions.
save_function (Callable
) — The function to use to save the state dictionary. Useful during distributed training when you need to replace torch.save
with another method. Can be configured with the environment variable DIFFUSERS_SAVE_MODE
.
safe_serialization (bool
, optional, defaults to True
) — Whether to save the model using safetensors
or the traditional PyTorch way with pickle
.
Save the LoRA parameters corresponding to the UNet and text encoder.
encode_prompt
( promptdevicenum_images_per_promptdo_classifier_free_guidancenegative_prompt = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Nonelora_scale: typing.Optional[float] = None )
Parameters
prompt (str
or List[str]
, optional) — prompt to be encoded device — (torch.device
): torch device
num_images_per_prompt (int
) — number of images that should be generated per prompt
do_classifier_free_guidance (bool
) — whether to use classifier free guidance or not
negative_prompt (str
or List[str]
, optional) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds
instead. Ignored when not using guidance (i.e., ignored if guidance_scale
is less than 1
).
prompt_embeds (torch.FloatTensor
, optional) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument.
negative_prompt_embeds (torch.FloatTensor
, optional) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt
input argument.
lora_scale (float
, optional) — A lora scale that will be applied to all LoRA layers of the text encoder if LoRA layers are loaded.
Encodes the prompt into text encoder hidden states.
( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray]nsfw_content_detected: typing.Optional[typing.List[bool]] )
Parameters
images (List[PIL.Image.Image]
or np.ndarray
) — List of denoised PIL images of length batch_size
or NumPy array of shape (batch_size, height, width, num_channels)
.
nsfw_content_detected (List[bool]
) — List indicating whether the corresponding generated image contains “not-safe-for-work” (nsfw) content or None
if safety checking could not be performed.
Output class for Stable Diffusion pipelines.
( vae: FlaxAutoencoderKLtext_encoder: FlaxCLIPTextModeltokenizer: CLIPTokenizerunet: FlaxUNet2DConditionModelscheduler: typing.Union[diffusers.schedulers.scheduling_ddim_flax.FlaxDDIMScheduler, diffusers.schedulers.scheduling_pndm_flax.FlaxPNDMScheduler, diffusers.schedulers.scheduling_lms_discrete_flax.FlaxLMSDiscreteScheduler, diffusers.schedulers.scheduling_dpmsolver_multistep_flax.FlaxDPMSolverMultistepScheduler]safety_checker: FlaxStableDiffusionSafetyCheckerfeature_extractor: CLIPImageProcessordtype: dtype = <class 'jax.numpy.float32'> )
Parameters
tokenizer (CLIPTokenizer
) — A CLIPTokenizer
to tokenize text.
feature_extractor (CLIPImageProcessor
) — A CLIPImageProcessor
to extract features from generated images; used as inputs to the safety_checker
.
Flax-based pipeline for text-to-image generation using Stable Diffusion.
__call__
Parameters
prompt (str
or List[str]
, optional) — The prompt or prompts to guide image generation.
height (int
, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor
) — The height in pixels of the generated image.
width (int
, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor
) — The width in pixels of the generated image.
num_inference_steps (int
, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float
, optional, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the text prompt
at the expense of lower image quality. Guidance scale is enabled when guidance_scale > 1
.
latents (jnp.array
, optional) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents array is generated by sampling using the supplied random generator
.
jit (bool
, defaults to False
) — Whether to run pmap
versions of the generation and safety scoring functions.
This argument exists because __call__
is not yet end-to-end pmap-able. It will be removed in a future release.
Returns
The call function to the pipeline for generation.
Examples:
Copied
( images: ndarraynsfw_content_detected: typing.List[bool] )
Parameters
images (np.ndarray
) — Denoised images of array shape of (batch_size, height, width, num_channels)
.
nsfw_content_detected (List[bool]
) — List indicating whether the corresponding generated image contains “not-safe-for-work” (nsfw) content or None
if safety checking could not be performed.
Output class for Flax-based Stable Diffusion pipelines.
replace
( **updates )
“Returns a new object replacing the specified fields with new values.
This model inherits from . Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).
for loading textual inversion embeddings
for loading LoRA weights
for saving LoRA weights
for loading .ckpt
files
( prompt: typing.Union[str, typing.List[str]] = Noneheight: typing.Optional[int] = Nonewidth: typing.Optional[int] = Nonenum_inference_steps: int = 50guidance_scale: float = 7.5negative_prompt: typing.Union[str, typing.List[str], NoneType] = Nonenum_images_per_prompt: typing.Optional[int] = 1eta: float = 0.0generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = Nonelatents: typing.Optional[torch.FloatTensor] = Noneprompt_embeds: typing.Optional[torch.FloatTensor] = Nonenegative_prompt_embeds: typing.Optional[torch.FloatTensor] = Noneoutput_type: typing.Optional[str] = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, torch.FloatTensor], NoneType], NoneType] = Nonecallback_steps: int = 1cross_attention_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = Noneguidance_rescale: float = 0.0 ) → or tuple
eta (float
, optional, defaults to 0.0) — Corresponds to parameter eta (η) from the paper. Only applies to the , and is ignored in other schedulers.
generator (torch.Generator
or List[torch.Generator]
, optional) — A to make generation deterministic.
return_dict (bool
, optional, defaults to True
) — Whether or not to return a instead of a plain tuple.
cross_attention_kwargs (dict
, optional) — A kwargs dictionary that if specified is passed along to the AttentionProcessor
as defined in .
guidance_rescale (float
, optional, defaults to 0.7) — Guidance rescale factor from . Guidance rescale factor should fix overexposure when using zero terminal SNR.
or tuple
If return_dict
is True
, is returned, otherwise a tuple
is returned where the first element is a list with the generated images and the second element is a list of bool
s indicating whether the corresponding generated image contains “not-safe-for-work” (nsfw) content.
attention_op (Callable
, optional) — Override the default None
operator for use as op
argument to the function of xFormers.
Enable memory efficient attention from . When this option is enabled, you should observe lower GPU memory usage and a potential speed up during inference. Speed up during training is not guaranteed.
Disable memory efficient attention from .
A .
text_encoder (CLIPTextModel
, optional) — Frozen text-encoder (). If not specified, function will take self.tokenizer.
Load textual inversion embeddings into the text encoder of (both 🌍 Diffusers and Automatic1111 formats are supported).
To load a textual inversion embedding vector in Automatic1111 format, make sure to download the vector first (for example from ) and then load the vector
text_encoder (CLIPTextModel
, optional, defaults to None
) — An instance of CLIPTextModel
to use, specifically the variant. If this parameter is None
, the function loads a new instance of CLIPTextModel
by itself if needed.
Instantiate a from pretrained pipeline weights saved in the .ckpt
or .safetensors
format. The pipeline is set in evaluation mode (model.eval()
) by default.
pretrained_model_name_or_path_or_dict (str
or os.PathLike
or dict
) — See .
kwargs (dict
, optional) — See .
See for more details on how the state dict is loaded.
See for more details on how the state dict is loaded into self.unet
.
See for more details on how the state dict is loaded into self.text_encoder
.
vae () — Variational Auto-Encoder (VAE) model to encode and decode images to and from latent representations.
text_encoder (FlaxCLIPTextModel
) — Frozen text-encoder ().
unet () — A FlaxUNet2DConditionModel
to denoise the encoded image latents.
scheduler () — A scheduler to be used in combination with unet
to denoise the encoded image latents. Can be one of FlaxDDIMScheduler
, FlaxLMSDiscreteScheduler
, FlaxPNDMScheduler
, or FlaxDPMSolverMultistepScheduler
.
safety_checker (FlaxStableDiffusionSafetyChecker
) — Classification module that estimates whether generated images could be considered offensive or harmful. Please refer to the for more details about a model’s potential harms.
This model inherits from . Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).
( prompt_ids: arrayparams: typing.Union[typing.Dict, flax.core.frozen_dict.FrozenDict]prng_seed: PRNGKeyArraynum_inference_steps: int = 50height: typing.Optional[int] = Nonewidth: typing.Optional[int] = Noneguidance_scale: typing.Union[float, array] = 7.5latents: array = Noneneg_prompt_ids: array = Nonereturn_dict: bool = Truejit: bool = False ) → or tuple
return_dict (bool
, optional, defaults to True
) — Whether or not to return a instead of a plain tuple.
or tuple
If return_dict
is True
, is returned, otherwise a tuple
is returned where the first element is a list with the generated images and the second element is a list of bool
s indicating whether the corresponding generated image contains “not-safe-for-work” (nsfw) content.