Stable Diffusion 2

Stable Diffusion 2

Stable Diffusion 2 is a text-to-image latent diffusion model built upon the work of the original Stable Diffusionarrow-up-right, and it was led by Robin Rombach and Katherine Crowson from Stability AIarrow-up-right and LAIONarrow-up-right.

The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels. These models are trained on an aesthetic subset of the LAION-5B datasetarrow-up-right created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using LAION’s NSFW filterarrow-up-right.

For more details about how Stable Diffusion 2 works and how it differs from the original Stable Diffusion, please refer to the official announcement postarrow-up-right.

The architecture of Stable Diffusion 2 is more or less identical to the original Stable Diffusion modelarrow-up-right so check out it’s API documentation for how to use Stable Diffusion 2. We recommend using the DPMSolverMultistepSchedulerarrow-up-right as it’s currently the fastest scheduler.

Stable Diffusion 2 is available for tasks like text-to-image, inpainting, super-resolution, and depth-to-image:

Here are some examples for how to use Stable Diffusion 2 for each task:

Make sure to check out the Stable Diffusion Tipsarrow-up-right section to learn how to explore the tradeoff between scheduler speed and quality, and how to reuse pipeline components efficiently!

If you’re interested in using one of the official checkpoints for a task, explore the CompVisarrow-up-right, Runwayarrow-up-right, and Stability AIarrow-up-right Hub organizations!

Text-to-image

Copied

Inpainting

Copied

Super-resolution

Copied

Depth-to-image

Copied

Last updated