Text-to-image
Last updated
Last updated
Conditional image generation allows you to generate images from a text prompt. The text is converted into embeddings which are used to condition the model to generate an image from noise.
The is the easiest way to use a pre-trained diffusion system for inference.
Start by creating an instance of and specify which pipeline you would like to download.
In this guide, you’ll use for text-to-image generation with :
Copied
The downloads and caches all modeling, tokenization, and scheduling components. Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU. You can move the generator object to a GPU, just like you would in PyTorch:
Copied
Now you can use the generator
on your text prompt:
Copied
The output is by default wrapped into a object.
You can save the image by calling:
Copied
Try out the Spaces below, and feel free to play around with the guidance scale parameter to see how it affects the image quality!
Stable Diffusion 2.1 is the latest text-to-image model from StabilityAI. For faster generation and API access you can try .
Model by - backend running JAX on TPUs due to generous support of - Gradio Demo by 🤗 Hugging Face