THC Science

Stable Diffusion
	An image generated by Stable Diffusion based on the text prompt "a photograph of an astronaut riding a horse"
Developer(s)	StabilityAI
Initial release	August 22, 2022
Stable release	1.5 (model) / August 31, 2022
Repository	github.com/CompVis/stable-diffusion
Written in	Python
Operating system	Any that support CUDA kernels
Type	Text-to-image model
License	Creative ML OpenRAIL-M
Website	stability.ai

Stable Diffusion is a deep learning, text-to-image model released by startup StabilityAI in 2022. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.^[2]

Stable Diffusion is a latent diffusion model, a variety of generative neural network developed by researchers at LMU Munich. It was developed by Stability AI in collaboration with LMU and Runway, with support from EleutherAI and LAION.^[3]^[4]^[5] Stability AI is in talks to raise capital at a valuation of up to one billion dollars as of September 2022.^[6]

Stable Diffusion's code and model weights have been released publicly, and it can run on most consumer hardware equipped with a modest GPU. This marked a departure from previous proprietary text-to-image models such as DALL-E and Midjourney which were accessible only via cloud services.^[7]^[8]

Architecture

Stable Diffusion is a form of diffusion model (DM). Introduced in 2015, diffusion models are trained with the objective of removing successive applications of Gaussian noise to training images, and can be thought of as a sequence of denoising autoencoders. Stable Diffusion uses a variant known as a "latent diffusion model" (LDM). Rather than learning to denoise image data (in "pixel space"), an autoencoder is trained to transform images into a lower-dimensional latent space. The process of adding and removing noise is applied to this latent representation, with the final denoised output then decoded into pixel space. Each denoising step is accomplished by a U-Net architecture. The researchers point to reduced computational requirements for training and generation as an advantage of LDMs.^[3]^[9]

The denoising step may be conditioned on a string of text, an image, or some other data. An encoding of the conditioning data is exposed to the denoising U-Nets via a cross-attention mechanism. For conditioning on text, a transformer language model was trained to encode text prompts.^[9]

Usage

The Stable Diffusion model supports the ability to generate new images from scratch through the use of a text prompt describing elements to be included or omitted from the output,^[4] and the redrawing of existing images which incorporate new elements described within a text prompt (a process commonly known as guided image synthesis^[10]) through the use of the model's diffusion-denoising mechanism.^[4] In addition, the model also allows the use of prompts to partially alter existing images via inpainting and outpainting, when used with an appropriate user interface that supports such features, of which numerous different open source implementations exist.^[11]

Stable Diffusion is recommended to be run with 10GB or more VRAM, however users with less VRAM may opt to load the weights in float16 precision instead of the default float32 to lower VRAM usage.^[12]

Text to image generation

Demonstration of the effect of negative prompts on image generation.

Top: No negative prompt
Centre: "green trees"
Bottom: "round stones, round rocks"

The text to image sampling script within Stable Diffusion, known as "txt2img", consumes a text prompt, in addition to assorted option parameters covering sampling types, output image dimensions, and seed values, and outputs an image file based on the model's interpretation of the prompt.^[4] Generated images are tagged with an invisible digital watermark to allow users to identify an image as generated by Stable Diffusion,^[4] although this watermark loses its effectiveness if the image is resized or rotated.^[13] The Stable Diffusion model is trained on a dataset consisting of 512×512 resolution images,^[4]^[14] meaning that txt2img output images are optimally configured to be generated at 512×512 resolution as well, and deviating from this size can result in poor quality generation outputs.^[12]

Each txt2img generation will involve a specific seed value which affects the output image; users may opt to randomise the seed in order to explore different generated outputs, or use the same seed to obtain the same image output as a previously generated image.^[12] Users are also able to adjust the number of inference steps for the sampler; a higher value takes a longer duration of time, however a smaller value may result in visual defects.^[12] Another configurable option, the classifier-free guidance scale value, allows the user to adjust how closely the output image adheres to the prompt;^[15] more experimentative or creative use cases may opt for a lower value, while use cases aiming for more specific outputs may use a higher value.^[12]

Negative prompts are a feature included in some user interface implementations of Stable Diffusion which allow the user to specify prompts which the model should avoid during image generation, for use cases where undesirable image features would otherwise be present within image outputs due to the positive prompts provided by the user, or due to how the model was originally trained.^[11] The use of negative prompts has a highly statistically significant effect on decreasing the frequency of generating unwanted outputs compared to the use of emphasis markers, which are another alternative method of adding weight to parts of prompts utilised by some open-source implementations of Stable Diffusion, where brackets are added to keywords to add or reduce emphasis.^[16]

Image modification

Stable Diffusion includes another sampling script, "img2img", which consumes a text prompt, path to an existing image, and strength value between 0.0 and 1.0, and outputs a new image based on the original image that also features elements provided within the textual prompt; the strength value denotes the amount of noise added to the output image, with a higher value producing images with more variation, however may not be semantically consistent with the prompt provided.^[4] Image upscaling is one potential use case of img2img, among others.^[4]

Inpainting and outpainting

Additional use-cases for image modification via img2img are offered by numerous different front-end implementations of the Stable Diffusion model. Inpainting involves selectively modifying a portion of an existing image delineated by a user-provided mask, which fills the masked space with newly generated content based on the provided prompt.^[11] Conversely, outpainting extends an image beyond its original dimensions, filling the previously empty space with content generated based on the provided prompt.^[11]

License

Unlike models like DALL-E, Stable Diffusion makes its source code available,^[17]^[4] along with pre-trained weights. Its license prohibits certain use cases, including crime, libel, harassment, doxxing, "exploiting ... minors", giving medical advice, automatically creating legal obligations, producing legal evidence, and "discriminating against or harming individuals or groups based on ... social behavior or ... personal or personality characteristics ... [or] legally protected characteristics or categories".^[18]^[19] The user owns the rights to their generated output images, and is free to use them commercially.^[20]

Training

Stable Diffusion was trained on pairs of images and captions taken from LAION-5B, a publically available dataset derived from Common Crawl data scraped from the web. The dataset was created by LAION, a German non-profit which receives funding from Stability AI.^[14]^[21] The model was initially trained on a large subset of LAION-5B, with the final rounds of training done on "LAION-Aesthetics v2 5+", a subset of 600 million captioned images which an AI predicted that humans would give a score of at least 5 out of 10 when asked to rate how much they liked them.^[14]^[22] This final subset also excluded low-resolution images and images which an AI identified as carrying a watermark.^[14] A third-party analysis of the model's training data identified that out of a smaller subset of 12 million images taken from the original wider dataset used, approximately 47% of the sample size of images came from 100 different domains, with Pinterest taking up 8.5% of the subset, followed by websites such as WordPress, Blogspot, Flickr, DeviantArt and Wikimedia Commons.^[23]^[14]

The model was trained using 256 Nvidia A100 GPUs on Amazon Web Services for a total of 150,000 GPU-hours, at a cost of $600,000.^[24]^[25]^[26]

Societal impacts

As visual styles and compositions are not subject to copyright, it is often interpreted that users of Stable Diffusion who generate images of artworks should not be considered to be infringing upon the copyright of visually similar works,^[27] however individuals depicted in generated images may still be protected by personality rights if their likeness is used,^[27] and intellectual property such as recognisable brand logos still remain protected by copyright. Nonetheless, visual artists have expressed concern that widespread usage of image synthesis software such as Stable Diffusion may eventually lead to human artists, along with photographers, models, cinematographers and actors, to gradually lose commercial viability against AI-based competitors.^[21]

Stable Diffusion is notably more permissive in the types of content users may generate, such as violent or sexually explicit imagery, in comparison to similar machine learning image synthesis products from other companies.^[28] Addressing concerns that the model may be used for abusive purposes, CEO of StabilityAI Emad Mostaque explains that "(it is) peoples' responsibility as to whether they are ethical, moral, and legal in how they operate this technology",^[8] and that putting the capabilities of Stable Diffusion into the hands of the public would result in the technology providing a net benefit overall, even in spite of the potential negative consequences.^[8] In addition, Mostaque argues that the intention behind the open availability of Stable Diffusion is to end corporate control and dominance over such technologies, who have previously only developed closed AI systems for image synthesis.^[8]^[28]

References

^ Mostaque, Emad (2022-06-06). "Stable Diffusion 1.5 beta now available to try via API and #DreamStudio, let me know what you think. Much more tomorrow…". Twitter. Archived from the original on 2022-09-27.
^ "Diffuse The Rest - a Hugging Face Space by huggingface". huggingface.co. Archived from the original on 2022-09-05. Retrieved 2022-09-05.
^ ^a ^b "Stable Diffusion Launch Announcement". Stability.Ai. Archived from the original on 2022-09-05. Retrieved 2022-09-06.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ "Stable Diffusion Repository on GitHub". CompVis - Machine Vision and Learning Research Group, LMU Munich. 17 September 2022. Retrieved 17 September 2022.
^ "Revolutionizing image generation by AI: Turning text into images". LMU Munich. Retrieved 17 September 2022.
^ Cai, Kenrick. "Startup Behind AI Image Generator Stable Diffusion Is In Talks To Raise At A Valuation Up To $1 Billion". Forbes. Retrieved 2022-09-10.
^ "The new killer app: Creating AI art will absolutely crush your PC". PCWorld. Archived from the original on 2022-08-31. Retrieved 2022-08-31.
^ ^a ^b ^c ^d Vincent, James (15 September 2022). "Anyone can use this AI art generator — that's the risk". The Verge.
^ ^a ^b Rombach; Blattmann; Lorenz; Esser; Ommer (June 2022). High-Resolution Image Synthesis with Latent Diffusion Models (PDF). International Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA. pp. 10684–10695. arXiv:2112.10752.
^ Meng, Chenlin; He, Yutong; Song, Yang; Song, Jiaming; Wu, Jiajun; Zhu, Jun-Yan; Ermon, Stefano (August 2, 2021). "SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations". arXiv. arXiv. doi:10.48550/arXiv.2108.01073.
^ ^a ^b ^c ^d "Stable Diffusion web UI". GitHub.
^ ^a ^b ^c ^d ^e "Stable Diffusion with 🧨 Diffusers". Hugging Face official blog. August 22, 2022.
^ "invisible-watermark README.md". GitHub.
^ ^a ^b ^c ^d ^e Baio, Andy (30 August 2022). "Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion's Image Generator". Waxy.org.
^ Ho, Jonathan; Salimans, Tim (July 26, 2022). "Classifier-Free Diffusion Guidance". arXiv. arXiv. doi:10.48550/arXiv.2207.12598.
^ Johannes Gaessler (September 11, 2022). "Emphasis". GitHub.
^ "Stable Diffusion Public Release". Stability.Ai. Archived from the original on 2022-08-30. Retrieved 2022-08-31.
^ ^a ^b "Ready or not, mass video deepfakes are coming". The Washington Post. 2022-08-30. Archived from the original on 2022-08-31. Retrieved 2022-08-31.
^ "License - a Hugging Face Space by CompVis". huggingface.co. Archived from the original on 2022-09-04. Retrieved 2022-09-05.
^ Katsuo Ishida (August 26, 2022). "言葉で指示した画像を凄いAIが描き出す「Stable Diffusion」～画像は商用利用も可能". Impress Corporation (in Japanese).
^ ^a ^b Heikkilä, Melissa (16 September 2022). "This artist is dominating AI-generated art. And he's not happy about it". MIT Technology Review.
^ "LAION-Aesthetics | LAION". laion.ai. Archived from the original on 2022-08-26. Retrieved 2022-09-02.
^ Alex Ivanovs (September 8, 2022). "Stable Diffusion: Tutorials, Resources, and Tools". Stackdiary.
^ Mostaque, Emad (August 28, 2022). "Cost of construction". Twitter. Archived from the original on 2022-09-06. Retrieved 2022-09-06.
^ "Stable Diffusion v1-4 Model Card". huggingface.co. Retrieved 2022-09-20.{{cite web}}: CS1 maint: url-status (link)
^ ^a ^b "This startup is setting a DALL-E 2-like AI free, consequences be damned". TechCrunch. Retrieved 2022-09-20.{{cite web}}: CS1 maint: url-status (link)
^ ^a ^b "高性能画像生成AI「Stable Diffusion」無料リリース。「kawaii」までも理解し創造する画像生成AI". Automaton Media (in Japanese). August 24, 2022.
^ ^a ^b Ryo Shimizu (August 26, 2022). "Midjourneyを超えた？無料の作画AI｢ #StableDiffusion ｣が｢AIを民主化した｣と断言できる理由". Business Insider Japan (in Japanese).

External links

Stable Diffusion Demo

[1] Mostaque, Emad (2022-06-06). "Stable Diffusion 1.5 beta now available to try via API and #DreamStudio, let me know what you think. Much more tomorrow…". Twitter. Archived from the original on 2022-09-27.

[2] "Diffuse The Rest - a Hugging Face Space by huggingface". huggingface.co. Archived from the original on 2022-09-05. Retrieved 2022-09-05.

[stable-diffusion-launch-3] "Stable Diffusion Launch Announcement". Stability.Ai. Archived from the original on 2022-09-05. Retrieved 2022-09-06.

[stable-diffusion-github-4] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ "Stable Diffusion Repository on GitHub". CompVis - Machine Vision and Learning Research Group, LMU Munich. 17 September 2022. Retrieved 17 September 2022.

[5] "Revolutionizing image generation by AI: Turning text into images". LMU Munich. Retrieved 17 September 2022.

[6] Cai, Kenrick. "Startup Behind AI Image Generator Stable Diffusion Is In Talks To Raise At A Valuation Up To $1 Billion". Forbes. Retrieved 2022-09-10.

[pcworld-7] "The new killer app: Creating AI art will absolutely crush your PC". PCWorld. Archived from the original on 2022-08-31. Retrieved 2022-08-31.

[verge-8] Vincent, James (15 September 2022). "Anyone can use this AI art generator — that's the risk". The Verge.

[paper-9] Rombach; Blattmann; Lorenz; Esser; Ommer (June 2022). High-Resolution Image Synthesis with Latent Diffusion Models (PDF). International Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA. pp. 10684–10695. arXiv:2112.10752.

[10] Meng, Chenlin; He, Yutong; Song, Yang; Song, Jiaming; Wu, Jiajun; Zhu, Jun-Yan; Ermon, Stefano (August 2, 2021). "SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations". arXiv. arXiv. doi:10.48550/arXiv.2108.01073.

[webui_showcase-11] "Stable Diffusion web UI". GitHub.

[diffusers-12] "Stable Diffusion with 🧨 Diffusers". Hugging Face official blog. August 22, 2022.

[13] "invisible-watermark README.md". GitHub.

[Waxy-14] Baio, Andy (30 August 2022). "Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion's Image Generator". Waxy.org.

[15] Ho, Jonathan; Salimans, Tim (July 26, 2022). "Classifier-Free Diffusion Guidance". arXiv. arXiv. doi:10.48550/arXiv.2207.12598.

[16] Johannes Gaessler (September 11, 2022). "Emphasis". GitHub.

[stability-17] "Stable Diffusion Public Release". Stability.Ai. Archived from the original on 2022-08-30. Retrieved 2022-08-31.

[washingtonpost-18] "Ready or not, mass video deepfakes are coming". The Washington Post. 2022-08-30. Archived from the original on 2022-08-31. Retrieved 2022-08-31.

[19] "License - a Hugging Face Space by CompVis". huggingface.co. Archived from the original on 2022-09-04. Retrieved 2022-09-05.

[20] Katsuo Ishida (August 26, 2022). "言葉で指示した画像を凄いAIが描き出す「Stable Diffusion」～画像は商用利用も可能". Impress Corporation (in Japanese).

[MIT-LAION-21] Heikkilä, Melissa (16 September 2022). "This artist is dominating AI-generated art. And he's not happy about it". MIT Technology Review.

[LAION-Aesthetics-22] "LAION-Aesthetics | LAION". laion.ai. Archived from the original on 2022-08-26. Retrieved 2022-09-02.

[23] Alex Ivanovs (September 8, 2022). "Stable Diffusion: Tutorials, Resources, and Tools". Stackdiary.

[24] Mostaque, Emad (August 28, 2022). "Cost of construction". Twitter. Archived from the original on 2022-09-06. Retrieved 2022-09-06.

[stable-diffusion-model-card-1-4-25] "Stable Diffusion v1-4 Model Card". huggingface.co. Retrieved 2022-09-20.{{cite web}}: CS1 maint: url-status (link)

[techcrunch-model-26] "This startup is setting a DALL-E 2-like AI free, consequences be damned". TechCrunch. Retrieved 2022-09-20.{{cite web}}: CS1 maint: url-status (link)

[automaton-27] "高性能画像生成AI「Stable Diffusion」無料リリース。「kawaii」までも理解し創造する画像生成AI". Automaton Media (in Japanese). August 24, 2022.

[bijapan-28] Ryo Shimizu (August 26, 2022). "Midjourneyを超えた？無料の作画AI｢ #StableDiffusion ｣が｢AIを民主化した｣と断言できる理由". Business Insider Japan (in Japanese).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

@@ Line 59: / Line 59: @@
 ==== Inpainting and outpainting ====
 Additional use-cases for image modification via img2img are offered by numerous different front-end implementations of the Stable Diffusion model. Inpainting involves selectively modifying a portion of an existing image delineated by a user-provided [[image mask|mask]], which fills the masked space with newly generated content based on the provided prompt.<ref name="webui_showcase"/> Conversely, outpainting extends an image beyond its original dimensions, filling the previously empty space with content generated based on the provided prompt.<ref name="webui_showcase"/>
-{{multiple image
- | direction = horizontal
- | align = none
- | image1 = Demonstration of inpainting and outpainting using Stable Diffusion (step 1 of 4).png
- | width1 = 125
- | caption1 = '''Step 1:''' An image is generated from scratch using txt2img. Coincidentally, the AI has inadvertently generated this subject with one arm missing.
- | image2 = Demonstration of inpainting and outpainting using Stable Diffusion (step 2 of 4).png
- | width2 = 125
- | caption2 = '''Step 2:''' Via outpainting, the bottom of the image is extended by 512 pixels and filled with AI-generated content.
- | image3 = Demonstration of inpainting and outpainting using Stable Diffusion (step 3 of 4).png
- | width3 = 125
- | caption3 = '''Step 3:''' In preparation for inpainting, a makeshift arm is drawn using the paintbrush in [[GIMP]].
- | image4 = Demonstration of inpainting and outpainting using Stable Diffusion (step 4 of 4).png
- | width4 = 125
- | caption4 = '''Step 4:''' An inpainting mask is applied over the makeshift arm, and img2img generates a new arm while leaving the remainder of the image untouched.
- | header = Demonstration of inpainting and outpainting techniques using img2img within Stable Diffusion
-}}
 == License ==
 Unlike models like [[DALL-E]], Stable Diffusion makes its [[Source-available software|source code available]],<ref name="stability">{{cite web|url=https://stability.ai/blog/stable-diffusion-public-release|title=Stable Diffusion Public Release|website=Stability.Ai|access-date=2022-08-31|archive-date=2022-08-30|archive-url=https://web.archive.org/web/20220830210535/https://stability.ai/blog/stable-diffusion-public-release|url-status=live}}</ref><ref name="stable-diffusion-github" /> along with pre-trained weights. Its license prohibits certain use cases, including crime, [[libel]], [[harassment]], [[doxxing]], "exploiting ... minors", giving medical advice, automatically creating legal obligations, producing legal evidence, and "discriminating against or harming individuals or groups based on ... social behavior or ... personal or personality characteristics ... [or] [[Anti-discrimination law|legally protected characteristics or categories]]".<ref name="washingtonpost">{{cite news |date=2022-08-30 |title=Ready or not, mass video deepfakes are coming |newspaper=The Washington Post |url=https://www.washingtonpost.com/technology/2022/08/30/deep-fake-video-on-agt/ |url-status=live |access-date=2022-08-31 |archive-url=https://web.archive.org/web/20220831115010/https://www.washingtonpost.com/technology/2022/08/30/deep-fake-video-on-agt/ |archive-date=2022-08-31}}</ref><ref>{{Cite web |title=License - a Hugging Face Space by CompVis |url=https://huggingface.co/spaces/CompVis/stable-diffusion-license |access-date=2022-09-05 |website=huggingface.co |archive-date=2022-09-04 |archive-url=https://web.archive.org/web/20220904215616/https://huggingface.co/spaces/CompVis/stable-diffusion-license |url-status=live }}</ref> The user owns the rights to their generated output images, and is free to use them commercially.<ref>{{cite web|author=Katsuo Ishida|date=August 26, 2022|url=https://forest.watch.impress.co.jp/docs/review/1434893.html|title=言葉で指示した画像を凄いAIが描き出す「Stable Diffusion」 ～画像は商用利用も可能|website=Impress Corporation|language=ja}}</ref>

THC Science

Bringing Science to the Cannabis Conversation!

Legality of Cannabis by U.S. Jurisdiction

Revision as of 17:16, 8 October 2022