Prompt weighting
Prompt weighting
Prompt weighting provides a way to emphasize or de-emphasize certain parts of a prompt, allowing for more control over the generated image. A prompt can include several concepts, which gets turned into contextualized text embeddings. The embeddings are used by the model to condition its cross-attention layers to generate an image (read the Stable Diffusion blog post to learn more about how it works).
Prompt weighting works by increasing or decreasing the scale of the text embedding vector that corresponds to its concept in the prompt because you may not necessarily want the model to focus on all concepts equally. The easiest way to prepare the prompt-weighted embeddings is to use Compel, a text prompt-weighting and blending library. Once you have the prompt-weighted embeddings, you can pass them to any pipeline that has a prompt_embeds
(and optionally negative_prompt_embeds
) parameter, such as StableDiffusionPipeline, StableDiffusionControlNetPipeline, and StableDiffusionXLPipeline.
If your favorite pipeline doesn’t have a prompt_embeds
parameter, please open an issue so we can add it!
This guide will show you how to weight and blend your prompts with Compel in 🌍 Diffusers.
Before you begin, make sure you have the latest version of Compel installed:
Copied
For this guide, let’s generate an image with the prompt "a red cat playing with a ball"
using the StableDiffusionPipeline:
Copied
Weighting
You’ll notice there is no “ball” in the image! Let’s use compel to upweight the concept of “ball” in the prompt. Create a Compel
object, and pass it a tokenizer and text encoder:
Copied
compel uses +
or -
to increase or decrease the weight of a word in the prompt. To increase the weight of “ball”:
+
corresponds to the value 1.1
, ++
corresponds to 1.1^2
, and so on. Similarly, -
corresponds to 0.9
and --
corresponds to 0.9^2
. Feel free to experiment with adding more +
or -
in your prompt!
Copied
Pass the prompt to compel_proc
to create the new prompt embeddings which are passed to the pipeline:
Copied
To downweight parts of the prompt, use the -
suffix:
Copied
You can even up or downweight multiple concepts in the same prompt:
Copied
Blending
You can also create a weighted blend of prompts by adding .blend()
to a list of prompts and passing it some weights. Your blend may not always produce the result you expect because it breaks some assumptions about how the text encoder functions, so just have fun and experiment with it!
Copied
Conjunction
A conjunction diffuses each prompt independently and concatenates their results by their weighted sum. Add .and()
to the end of a list of prompts to create a conjunction:
Copied
Textual inversion
Textual inversion is a technique for learning a specific concept from some images which you can use to generate new images conditioned on that concept.
Create a pipeline and use the load_textual_inversion() function to load the textual inversion embeddings (feel free to browse the Stable Diffusion Conceptualizer for 100+ trained concepts):
Copied
Compel provides a DiffusersTextualInversionManager
class to simplify prompt weighting with textual inversion. Instantiate DiffusersTextualInversionManager
and pass it to the Compel
class:
Copied
Incorporate the concept to condition a prompt with using the <concept>
syntax:
Copied
DreamBooth
DreamBooth is a technique for generating contextualized images of a subject given just a few images of the subject to train on. It is similar to textual inversion, but DreamBooth trains the full model whereas textual inversion only fine-tunes the text embeddings. This means you should use from_pretrained() to load the DreamBooth model (feel free to browse the Stable Diffusion Dreambooth Concepts Library for 100+ trained models):
Copied
Create a Compel
class with a tokenizer and text encoder, and pass your prompt to it. Depending on the model you use, you’ll need to incorporate the model’s unique identifier into your prompt. For example, the dndcoverart-v1
model uses the identifier dndcoverart
:
Copied
Stable Diffusion XL
Stable Diffusion XL (SDXL) has two tokenizers and text encoders so it’s usage is a bit different. To address this, you should pass both tokenizers and encoders to the Compel
class:
Copied
This time, let’s upweight “ball” by a factor of 1.5 for the first prompt, and downweight “ball” by 0.6 for the second prompt. The StableDiffusionXLPipeline also requires pooled_prompt_embeds
(and optionally negative_pooled_prompt_embeds
) so you should pass those to the pipeline along with the conditioning tensors:
Copied
Last updated