stable-diffusionStable Diffusion is a latent text-to-image diffusion model trained on 512x512 images with a CLIP ViT-L/14 text encoder.