Text-to-image

Conditional image generation

Open In ColabOpen In Studio Lab

Conditional image generation allows you to generate images from a text prompt. The text is converted into embeddings which are used to condition the model to generate an image from noise.

The DiffusionPipelinearrow-up-right is the easiest way to use a pre-trained diffusion system for inference.

Start by creating an instance of DiffusionPipelinearrow-up-right and specify which pipeline checkpointarrow-up-right you would like to download.

In this guide, you’ll use DiffusionPipelinearrow-up-right for text-to-image generation with runwayml/stable-diffusion-v1-5arrow-up-right:

Copied

>>> from diffusers import DiffusionPipeline

>>> generator = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_safetensors=True)

The DiffusionPipelinearrow-up-right downloads and caches all modeling, tokenization, and scheduling components. Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU. You can move the generator object to a GPU, just like you would in PyTorch:

Copied

>>> generator.to("cuda")

Now you can use the generator on your text prompt:

Copied

>>> image = generator("An image of a squirrel in Picasso style").images[0]

The output is by default wrapped into a PIL.Imagearrow-up-right object.

You can save the image by calling:

Copied

Try out the Spaces below, and feel free to play around with the guidance scale parameter to see how it affects the image quality!

Stable Diffusion 2.1 Demo

Stable Diffusion 2.1 is the latest text-to-image model from StabilityAI. Access Stable Diffusion 1 Space herearrow-up-right For faster generation and API access you can try DreamStudio Betaarrow-up-right.

Model by StabilityAIarrow-up-right - backend running JAX on TPUs due to generous support of Google TRC programarrow-up-right - Gradio Demo by πŸ€— Hugging Face

Last updated