Stable Diffusion is a latent text-to-image diffusion model trained on 512×512 images from a subset of the LAION-5B database.
The model utilizes a frozen CLIP ViT-L/14 text encoder to condition its output on text prompts.
It incorporates an 860M UNet and a 123M text encoder, making it a relatively lightweight model that can run on a GPU with at least 10GB VRAM.
The training was made possible by a generous compute donation from Stability AI and support from LAION. For more details, refer to the section below and the model card.
Stable Diffusion was created by a collaboration between Stability AI and Runway and since its release, Stable Diffusion has garnered significant popularity, with over 200,000 developers worldwide downloading and licensing the tool.
Additional details on the project are available here.