Fun with Stable Diffusion, Forge and Flux on AI generating images
What is Stable Diffusion?
From Wikipedia here
Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing artificial intelligence boom.
It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.[3] Its development involved researchers from the CompVis Group at Ludwig Maximilian University of Munich and Runway with a computational donation from Stability and training data from non-profit organizations.[4][5][6][7]
Stable Diffusion is a latent diffusion model, a kind of deep generative artificial neural network. Its code and model weights have been released publicly,[8] and it can run on most consumer hardware equipped with a modest GPU with at least 4?GB VRAM. This marked a departure from previous proprietary text-to-image models such as DALL-E and Midjourney which were accessible only via cloud services.[9][10]
(All links and footnotes refer to the original article)
What is Forge?
from Installation and use of Forge, a simple and efficient drawing tool that is better than WebUI here
The stable-diffusion-webui-forge tool is aStable Diffusion WebUI (based on Gradio)AI Drawing ToolsThe platform aims to simplify plugin development, optimize resource management, and accelerate inference.ForgeThe name was inspired byMinecraft Forge". The goal of this project is to become the Forge of SD WebUI. Forge promises to alwaysNo unnecessary changes will be added to the Stable Diffusion WebUI user interfaceFor those who are familiar with Stable Diffusion WebUI, they can use their experience with Automatic1111 WebUI to quickly get started with the operation of Forge.
Off topic: Forge author has been active inAIGCDrawing community. He has successively open-sourced excellent open-source software of ControlNet and Foooucs communities, and recently he has invested in the development of Forge, aiming to simplify the entry threshold of AIGC drawing for novices.
At a resolution of 1024px image quality, Forge can achieve a significant performance acceleration compared to the original WebUI in terms of SDXL model inference rate.
(All links and footnotes refer to the original article)
What is flux?
from Wikipedia here
Flux (also known as FLUX.1) is a text-to-image model developed by Black Forest Labs, based in Freiburg im Breisgau, Germany. Black Forest Labs were founded by former employees of Stability AI. As with other text-to-image models, Flux generates images from natural language descriptions, called prompts.
Flux is a series of text-to-image models. The models are based on a hybrid architecture that combines multimodal and parallel diffusion transformer blocks scaled to 12?billion parameters.
According to a test performed by Ars Technica, the outputs generated by Flux.1 Dev and Flux.1 Pro are comparable with DALL-E 3 in terms of prompt fidelity, with the photorealism closely matched Midjourney 6 and generated human hands with more consistency over previous models such as Stable Diffusion XL.[32]
Flux has been criticised for its very realistic generated images. According to media reports, depictions ranged from an image of Donald Trump posing with guns to disturbing scenes, which triggered discussions about ethical implications of technologies developed by Black Forest Labs.[4][13]
After the release of the model, social media X was flooded with Flux-generated images.[33][34] Black Forest Labs has not provided exact details of the data used to train the model.[29] Ars Technica suspected that Flux is based on a large, unauthorised collection of images scraped from the internet, a controversial practice with potential legal consequences.[32][35]
(All links and footnotes refer to the original article)
Where to find and download this components?
Beware: The 7z file has aound 2 GB, the complete Installation around 96 GB
Then you need the model flux
Hugging Face https://huggingface.co/black-forest-labs/FLUX.1-dev
Civitai https://civitai.com/models/618692/flux (you need to login there with your email adress)
Beware: flux has ca 20 GB (dev) and several LORAs add further, depending on your need. The sdxl model at https://civitai.com/models/101055?modelVersionId=128078 needs only 200MB)
Remark: All VAEs and Text encoders should download automatically, if not, the installation will complain and you should try to download the missing files from Github or HuggingFace.
What are the requirements?
I am using a HP z44o workstation with 64 GB Memory and a XEON E5-1650 V3 @ 3,5GHZ and Windows 23 H2.
I use a NVIDIA RTX A4000 with 16 GB VRAM.
On my experience, the A4000 makes the speed, the CPU is not so involved.
The difference is around 5-10 min with 6 GB VRAM and 25sec-1:30 min with 16 GB VRAM.
A glimpse of Image Generation
Here I can show you a short glimpse, for a further deep dive, refer to my articles (in the future).
Start it by pressing [iknstalldir]\webui_forge_cu121_torch21\webui\webui-user.bat (or webui-user.sh if you are on Linux)
The forge in a browser looks like
Here, the flux is preselected.
Lets try one prompt:
Enter as prompt tall skinny supermodel looking softly at the viewer, fashionable clothing with a hint of innocence and grace and press Generate
This would calculate in the w′cnd window
Distilled CFG Scale: 3.5
[Unload] Trying to free 30800.42 MB for cuda:0 with 0 models keep loaded ... Current free memory is 5530.43 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 15179.04 MB, Model Require: 22700.13 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: -8545.10 MB, CPU Swap Loaded (blocked method): 9882.00 MB, GPU Loaded: 12818.13 MB
Moving model(s) has taken 45.94 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:28<00:00, 4.44s/it]
[Unload] Trying to free 4495.77 MB for cuda:0 with 0 models keep loaded ... Current free memory is 2082.06 MB ... Unload model KModel Done.
[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 15154.58 MB, Model Require: 159.87 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 13970.71 MB, All loaded to GPU.
Moving model(s) has taken 8.54 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [01:07<00:00, 3.37s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [01:07<00:00, 2.97s/it]
and the image generated is