Combining the Power of Generative AI with the Creative Expression of Digital Art and Design
Gary Stafford
Principal Solutions Architect @AWS | Data Analytics and Generative AI Specialist | Experienced Technology Leader, Consultant, CTO, COO, President | 10x AWS Certified
Integrating fine-tuned Stable Diffusion models, advanced image generation workflows, and editing apps designed for creative professionals to produce digital?art
While there is widespread optimism about the positive transformations Generative AI can bring to our world, it is not without its share of controversies. Concerns about potential job displacement, misinformation, plagiarism, bias, and climate impact have been raised. Furthermore, there is a growing concern among authors, actors, visual artists, and musicians who feel that some model builders have improperly appropriated their creative content to train models, disregarding established copyright protections.
Although I am deeply involved in technology, including Generative AI, I hold degrees in fine arts and worked for years in the graphic design, commercial photography, and digital printing industries. Given my background, I am open to embracing the ever-evolving landscape of creative processes and excited about new and innovative methods of artistic expression that Generative AI can bring to content creators.
This post showcases the seamless integration of fine-tuned Stable Diffusion models, advanced image generation pipelines, image editing techniques, and professional-grade photograph and fine arts printing services. We will explore how content creators can leverage the innovative capabilities of these non-traditional Generative AI tools to craft visually engaging content.
Choosing a Stable Diffusion UI
Common generative imaging workflows include fine-tuning models, text-to-image, image-to-image, upscaling, and inpainting. These tasks can all be accomplished programmatically, most commonly in a Jupyter notebook. Popular hosted notebook environments include Amazon SageMaker Studio and Google Colaboratory (Colab). My recent article features Amazon SageMaker Studio for Stable Diffusion image generation:
However, notebooks have a much higher learning curve for non-programmers than low-code or no-code user interfaces. Several popular, browser-based, open-source Stable Diffusion graphical user interfaces (GUIs) are available for Stable Diffusion-based generative imaging tasks. These include AUTOMATIC1111 (A1111), ComfyUI , and Kohya_SS .
We will use A1111 in this post as it has a lower learning curve for anyone just starting. You can learn more about configuring and using A1111 in my most recent post:
Choosing a Stable Diffusion Model
Once you have chosen a GUI, you need to select a Stable Diffusion model to generate your images. According to AWS, “Stable Diffusion is a Generative AI model that produces unique photorealistic images from text and image prompts. Besides images, you can also use the model to create videos and animations.” The latest Stable Diffusion model from the model builder Stability AI is SDXL 1.0, announced in July 2023. The SDXL 1.0 base model can be found on Hugging Face and Civitai . I have downloaded and installed the base model checkpoint into A1111.
Refiner Model
Although you could use the base model independently to generate images, I often include the SDXL 1.0 refiner model as part of my image generation pipeline. According to Hugging Face , “SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement [refiner] model specialized for the final denoising steps. Note that the base model can be used as a standalone module.”
Alternative Models
Alternative Stable Diffusion models are available on Hugging Face and other popular sites, such as Civitai (warning: contains explicit content). These model checkpoints can be downloaded and used similarly to the SDXL base model.
Fine-tuning Models
According to the paper, DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation , “Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts.” Consequently, it is common to fine-tune Stable Diffusion models to achieve specific visual styles , such as steampunk, psychedelic, manga, dystopian, and vaporware. Models may also be fine-tuned to reflect the style of well-known artists, such as Picasso, Rembrandt, Monet, Van Gogh, and Dalí. Lastly, models may be fine-tuned to capture the look and feel of famous characters, actors, distinct cultural styles, holidays, products, name brands, or even yourself.
There are several techniques to fine-tune Stable Diffusion models, including Textual Inversion, LoRA, and DreamBooth. Being a sports car fan, in this post, I have fine-tuned the SDXL 1.0 base model using DreamBooth to reflect the distinct style of a late-model Mercedes-Benz AMG GT Coupe. According to the DreamBooth paper, “Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes.”
Alternatively, you could select an existing fine-tuned LoRA from model zoos such as Hugging Face or Civitai .
Below is a classic example of a fine-tuned SDXL model weight using LoRA, found on Hugging Face . This particular model is used to generate images in the style of ornate and detailed cut paper.
Training Image Dataset
To build the image data set for fine-tuning, I first collected 20–25 images of similar automobiles, publicly available online or by using my own photos. Using Adobe PhotoShop , I retouched each image, removing any districting objects, details from the license plate (personal preference), and some of the backgrounds. I also adjusted each image’s color and brightness. Lastly, I scaled each image to a maximum of 1024 pixels in the longest dimension, the optimal resolution for SDXL model training.
Next, I incrementally named each image and created a corresponding text file. The text file contains a description of the image. For example: “a matte gray mercedes benz amg gts coupe with fluorescent yellow graphics, driving on race track, racing, blurry background, motion, car, automobile, sports car.” Several tools can be used to automate the analysis and labeling of images, including Interrogate CLIP and Interrogate DeepBooru, both available in A1111.
With the image data set prepared, I used the Dreambooth Extension for Stable-Diffusion-WebUI , which I installed in A1111, to fine-tune the SDXL base model checkpoint.
Once complete, the fine-tuned checkpoint should be found in the txt2img > Lora tab of A1111. Installing the Civitai Extension for Automatic 1111 Stable Diffusion Web UI will ensure you get image previews and metadata for all model checkpoints downloaded from Civitai, including LoRAs. You can even add previews to your own fine-tuned model weights in A1111.
You can also share your fine-tuned weights for others to try on sites like Civitai .
领英推荐
Alternative Fine-tuning Services
As an alternative to using a Stable Diffusion GUI, several online services allow you to fine-tune Stable Diffusion models for a fee, including Civitiai .
Generating Images
To start the generative process, I will usually explore several variations of positive and negative prompts and generation parameters : LoRA weight, sampling method, and CFG (Classifier Free Guidance) Scale amount in A1111’s txt2img tab. Once I find an interesting combination, I will generate a batch of 6–12 images using different seed values. I typically use fewer sampling steps for batches to speed up the process. Later, I will increase the number of steps (and potential quality) for the final image generation.
Below is an example of images created using the SDXL base and refiner model coupled with the new DreamBooth weights. All were created with the same prompts and parameters but with different seeds. I preferred the blue car in the upper right corner for this post.
Sampling Steps
Sampling steps are the number of iterations Stable Diffusion executes to go from random noise to a recognizable image?—?denoising random noise. The number of steps will impact the look of the final image, as shown in the grid below (e.g., note the drastic changes between 40 and 50 steps). There is no magic number of sampling steps. Undersampling, with too few steps, can result in distorted and unrecognizable images. Conversely, oversampling (too many steps) also has drawbacks, including increased processing time for no measurable quality improvement and even loss of detail and odd image anomalies. I usually use 50–100 sampling steps. Note that the examples below use a base-to-refiner model sampling steps ratio of 80:20. The steps are split across the two model checkpoints.
Although I liked several images below, I chose the 80 sampling steps (center right) results.
Once I chose the image and the number of sampling steps, I generated the final base image at 1536 x 1024 pixels wide (3:2 ratio). By generating a slightly larger image, less upscaling is required later on.
If any part of the subject or background is objectionable, I often use the inpainting capabilities of A1111 in the img2img tab to remove or change them. A1111 even allows you to use masks created in Adobe Photoshop to more precisely mask areas of the generated image (lower left of screen), as shown below.
Retouching
Next, I opened the generated image in Adobe Photoshop for retouching and color adjustments. For the blue car, I removed the hood emblem and several anomalies on the front grill and side of the vehicle. I also dodged and burned different areas of the car, darkened the orange light in the warehouse window, and enhanced the red light on the post behind the vehicle. Lastly, I adjusted the image’s overall color, contrast, tone, saturation, and temperature.
Upscaling
Once the digital image editing was complete, I upscaled the image to the final size required for printing. A1111 offers several upscalers, including LDSR (Latent Diffusion Super Resolution), ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks or Enhanced SRGAN), and R-ESRGAN (Real-ESRGAN). Additionally, you can install other upscalers from Hugging Face, like the Stable Diffusion x4 upscaler . In my tests, I preferred the results of the R-ESRGAN 4x+ upscaler when scaling the image up to 4x. It is also the fastest of the upscalers I tested.
Using A1111, I upscaled the retouched image from 1536 x 1024 pixels to 4608 x 3072 pixels (3x) and subsequently 6144 x 4096 pixels (4x) using the R-ESRGAN 4x+ upscaler. The results were excellent and far superior to using an image editing program for scaling.
Digital Printing
Many reputable online service providers can digitally print images on various substrates and products. For this image, I chose Mpix , a US-based photo lab. They have several print options, including traditional photographic paper, Giclée fine-art prints, canvas, metal, wood, and acrylic. Mpix does an excellent job of suggesting the recommended image resolution based on the output size. I scaled the image to the recommended size range, uploaded it to Mpix, and ordered my print products.
The results from Mpix were perfect! The color was accurate, there was no loss of image detail, and the print was expertly mounted, finished, and packaged for delivery.
Previewing the Output
Another advantage of digital imaging is the ability to quickly simulate what the final print product will look like in an environment. For example, below, I superimposed the final car image, cropped to the correct print size, on a contemporary-style living room wall image licensed from Shutterstock.com . You can quickly visualize different sizes, layouts, and image arrangements.
Conclusion
This post has provided insights into incorporating fine-tuned Stable Diffusion models into advanced image generation pipelines, editing techniques, and digital printing processes. By exploring these non-traditional Generative AI tools, content creators can produce visually compelling content.
This blog represents my viewpoints and not those of my employer, Amazon Web Services (AWS). All product names, images, logos, and brands are the property of their respective owners.
Serie-iv?rks?tter & Foredragsholder om AI og LISTEN LOUDER ?? V?rt p? EXTRAORDINARY ?? AI-blogger p? B?rsen og Finans
10 个月Cool post Gary ?? This is me commenting personally. Not ChatGPT ?? Are you aware of EVERYDAY? Unlike Midjourney, Dalle and other AI image-generators it’s build on legal and Public Domain data; users are fully insured, and there is unlimited access once signed up. As a startup, we’re currently feeling a bit like David vs Goliath (we don't have their billions for marketing), but don’t we need a serious and more diverse alternative to the tech-giants? ?? https://jumpstory.com/everyday-ai/ https://jumpstory.com/blog/how-we-have-built-everyday-ai/ Currently EVERYDAY can create illustrations, 3D photos, icons, drawings, sketches and more. We launch ‘text to authentic photos’ in 2 weeks from today and text to video in a few months. I’m the co-founder of JumpStory, the creators of Everyday, by the way ??
Crafting Audits, Process, Automations that Generate ?+??| FULL REMOTE Only | Founder & Tech Creative | 30+ Companies Guided
10 个月Sounds fascinating! Can't wait to see the visually captivating content you'll create. ???
Cofounder @ Profit Leap and the 1st AI advisor for Entrepreneurs | CFO, CPA, Software Engineer
10 个月Sounds fascinating! Can't wait to see the artwork you create. ???
?? Unlock the potential of www.edusum.com/check-point for mastering Check Point certifications. Practice tests that set you on the path to victory! #CheckPointPotential #EdusumAdvantage
? Lead UX Designer at Medl | ?? Crafting global experiences with scalable design and GenAI
10 个月That sounds like a fascinating exploration of the intersection between technology and art! ???