Comparing Stable Diffusion fine-tuned models for photographic image generation
Alessandro Perilli ????
CEO & Chief of Research | AI & Emerging Technologies | Business & Product Strategy
To test the upcoming AP Workflow 6.0 for ComfyUI, today I want to compare the performance of 4 different open diffusion models in generating photographic content: SDXL 1.0, Realistic Stock Photo 1.0, RealVisXL 2.0, and CineVisionXL 1.5.
Let's start with the SDXL 1.0 Base model, enhanced by the SDXL Refiner, and compare what a simple prompt can do with four different negative prompts:
As you can see, how you write your prompt matters immensely.
The winner, IMO, is the one with the negative prompt optimized for photographic image generation.
Let's use that as a baseline to see if the optimization technique known as Free Lunch makes a big difference in the generation of the image:
While Free Lunch makes a difference, it doesn't necessarily make the picture better. So we'll stick with the original image, plus the negative prompt optimized for photographic image generation. That will be our baseline.
Now, with all parameters equal, and prompt settings equal, let's compare SDXL 1.0 with its fine-tuned alternatives:
As you can see, my parameters are fine for the SDXL 1.0 Base+Refiner, but the CFG Scale value is too high for the fine-tuned variants, so the images get "overcooked". Let's lower that value.
领英推荐
Each fine-tuned SDXL model has different Steps and CFG Scale values recommended by the creators, but a good compromise to maintain control over the generation is a CFG Scale = 7.
What happens in that case?
Much better and, IMO, Realistic Stock Photos 1.0 is the fine-tuned SDXL variant generating the most pleasant images.
The question now is:
Would these fine-tuned models perform better with their own recommended settings?
The answer is no. At least not on my Apple Silicon system.
In every test I've done, the KSampler called DPM++ 2S Ancestral, paired with the Scheduler called "Karras" produces better results compared to DPM++ SDE, DPM++ 2M, and DPM++ 3M SDE, which were recommended for those models.
Surely, NVIDIA cards perform in a different way compared to my Apple MPS.
Plus, these generations are done with ComfyUI, which interprets the weights in a different way compared to A1111 & co.
On my system, SDXL 1.0 Base + Refiner still produces the best results (at least in this test).
So let's close by passing that image through the Face Detailer function of my AP Workflow:
Much more could be done to this image, but Apple MPS is excruciatingly slow and this little comparison took hours. Imagine what I could do with an NVIDIA system...
If you are interested in the upcoming AP Workflow 6.0, keep an eye on https://perilli.com/ai/comfyui/
The takeaways from all of this are: