登录查看更多内容

Combining the Power of Generative AI with the Creative Expression of Digital Art and Design

Gary Stafford

Principal Solutions Architect @AWS | Data Analytics and Generative AI Specialist | Experienced Technology Leader, Consultant, CTO, COO, President | 10x AWS Certified

发布日期: 2024年1月4日

Integrating fine-tuned Stable Diffusion models, advanced image generation workflows, and editing apps designed for creative professionals to produce digital?art

While there is widespread optimism about the positive transformations Generative AI can bring to our world, it is not without its share of controversies. Concerns about potential job displacement, misinformation, plagiarism, bias, and climate impact have been raised. Furthermore, there is a growing concern among authors, actors, visual artists, and musicians who feel that some model builders have improperly appropriated their creative content to train models, disregarding established copyright protections.

Although I am deeply involved in technology, including Generative AI, I hold degrees in fine arts and worked for years in the graphic design, commercial photography, and digital printing industries. Given my background, I am open to embracing the ever-evolving landscape of creative processes and excited about new and innovative methods of artistic expression that Generative AI can bring to content creators.

This post showcases the seamless integration of fine-tuned Stable Diffusion models, advanced image generation pipelines, image editing techniques, and professional-grade photograph and fine arts printing services. We will explore how content creators can leverage the innovative capabilities of these non-traditional Generative AI tools to craft visually engaging content.

Choosing a Stable Diffusion UI

Common generative imaging workflows include fine-tuning models, text-to-image, image-to-image, upscaling, and inpainting. These tasks can all be accomplished programmatically, most commonly in a Jupyter notebook. Popular hosted notebook environments include Amazon SageMaker Studio and Google Colaboratory (Colab). My recent article features Amazon SageMaker Studio for Stable Diffusion image generation:

However, notebooks have a much higher learning curve for non-programmers than low-code or no-code user interfaces. Several popular, browser-based, open-source Stable Diffusion graphical user interfaces (GUIs) are available for Stable Diffusion-based generative imaging tasks. These include AUTOMATIC1111 (A1111), ComfyUI , and Kohya_SS .

ComfyUI Stable Diffusion GUI showing generative imaging pipeline similar to the one used in this post

We will use A1111 in this post as it has a lower learning curve for anyone just starting. You can learn more about configuring and using A1111 in my most recent post:

Choosing a Stable Diffusion Model

Once you have chosen a GUI, you need to select a Stable Diffusion model to generate your images. According to AWS, “Stable Diffusion is a Generative AI model that produces unique photorealistic images from text and image prompts. Besides images, you can also use the model to create videos and animations.” The latest Stable Diffusion model from the model builder Stability AI is SDXL 1.0, announced in July 2023. The SDXL 1.0 base model can be found on Hugging Face and Civitai . I have downloaded and installed the base model checkpoint into A1111.

Stability AI Stable Diffusion XL (SDXL) 1.0 base model available on Hugging Face

Refiner Model

Although you could use the base model independently to generate images, I often include the SDXL 1.0 refiner model as part of my image generation pipeline. According to Hugging Face , “SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement [refiner] model specialized for the final denoising steps. Note that the base model can be used as a standalone module.”

Alternative Models

Alternative Stable Diffusion models are available on Hugging Face and other popular sites, such as Civitai (warning: contains explicit content). These model checkpoints can be downloaded and used similarly to the SDXL base model.

Example of a popular alternative SDXL model DreamShaper XL on Civitai

Fine-tuning Models

According to the paper, DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation , “Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts.” Consequently, it is common to fine-tune Stable Diffusion models to achieve specific visual styles , such as steampunk, psychedelic, manga, dystopian, and vaporware. Models may also be fine-tuned to reflect the style of well-known artists, such as Picasso, Rembrandt, Monet, Van Gogh, and Dalí. Lastly, models may be fine-tuned to capture the look and feel of famous characters, actors, distinct cultural styles, holidays, products, name brands, or even yourself.

There are several techniques to fine-tune Stable Diffusion models, including Textual Inversion, LoRA, and DreamBooth. Being a sports car fan, in this post, I have fine-tuned the SDXL 1.0 base model using DreamBooth to reflect the distinct style of a late-model Mercedes-Benz AMG GT Coupe. According to the DreamBooth paper, “Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes.”

Alternatively, you could select an existing fine-tuned LoRA from model zoos such as Hugging Face or Civitai .

Sample of SDXL text-to-image LoRA models on Hugging Face

Sample of SDXL 1.0 architectural-stylized LoRA models available on Civitai

Below is a classic example of a fine-tuned SDXL model weight using LoRA, found on Hugging Face . This particular model is used to generate images in the style of ornate and detailed cut paper.

Papercut Style SDXL LoRA, trained by TheLastBen, available on Hugging Face

Training Image Dataset

To build the image data set for fine-tuning, I first collected 20–25 images of similar automobiles, publicly available online or by using my own photos. Using Adobe PhotoShop , I retouched each image, removing any districting objects, details from the license plate (personal preference), and some of the backgrounds. I also adjusted each image’s color and brightness. Lastly, I scaled each image to a maximum of 1024 pixels in the longest dimension, the optimal resolution for SDXL model training.

Next, I incrementally named each image and created a corresponding text file. The text file contains a description of the image. For example: “a matte gray mercedes benz amg gts coupe with fluorescent yellow graphics, driving on race track, racing, blurry background, motion, car, automobile, sports car.” Several tools can be used to automate the analysis and labeling of images, including Interrogate CLIP and Interrogate DeepBooru, both available in A1111.

Optimized image dataset for fine-tuning Stable Diffusion XL (SDXL) using DreamBooth

With the image data set prepared, I used the Dreambooth Extension for Stable-Diffusion-WebUI , which I installed in A1111, to fine-tune the SDXL base model checkpoint.

Once complete, the fine-tuned checkpoint should be found in the txt2img > Lora tab of A1111. Installing the Civitai Extension for Automatic 1111 Stable Diffusion Web UI will ensure you get image previews and metadata for all model checkpoints downloaded from Civitai, including LoRAs. You can even add previews to your own fine-tuned model weights in A1111.

LoRAs loaded into A1111, including my new LoRA applied to prompt

You can also share your fine-tuned weights for others to try on sites like Civitai .

领英推荐

Introduction to Pica AI

Blockchain Council 4 个月前

Discovering Ideas: Navigating the Circle of Creative…

Raul Arantes 1 年前

AI IMAGES generator

Carlos Honorato 1 年前

LoRA-based fine-tuned SDXL 1.0 model available on Civitai

Alternative Fine-tuning Services

As an alternative to using a Stable Diffusion GUI, several online services allow you to fine-tune Stable Diffusion models for a fee, including Civitiai .

Fine-tuning a model using LoRA on Civitai

Generating Images

To start the generative process, I will usually explore several variations of positive and negative prompts and generation parameters : LoRA weight, sampling method, and CFG (Classifier Free Guidance) Scale amount in A1111’s txt2img tab. Once I find an interesting combination, I will generate a batch of 6–12 images using different seed values. I typically use fewer sampling steps for batches to speed up the process. Later, I will increase the number of steps (and potential quality) for the final image generation.

A batch of generated images in A1111 using the new LoRA weights

Below is an example of images created using the SDXL base and refiner model coupled with the new DreamBooth weights. All were created with the same prompts and parameters but with different seeds. I preferred the blue car in the upper right corner for this post.

A batch of generated images using the new LoRA

Sampling Steps

Sampling steps are the number of iterations Stable Diffusion executes to go from random noise to a recognizable image?—?denoising random noise. The number of steps will impact the look of the final image, as shown in the grid below (e.g., note the drastic changes between 40 and 50 steps). There is no magic number of sampling steps. Undersampling, with too few steps, can result in distorted and unrecognizable images. Conversely, oversampling (too many steps) also has drawbacks, including increased processing time for no measurable quality improvement and even loss of detail and odd image anomalies. I usually use 50–100 sampling steps. Note that the examples below use a base-to-refiner model sampling steps ratio of 80:20. The steps are split across the two model checkpoints.

Although I liked several images below, I chose the 80 sampling steps (center right) results.

Same generation parameters and seed value but with a different number of sampling steps

Once I chose the image and the number of sampling steps, I generated the final base image at 1536 x 1024 pixels wide (3:2 ratio). By generating a slightly larger image, less upscaling is required later on.

Image generated in A1111 using new LoRA weights

If any part of the subject or background is objectionable, I often use the inpainting capabilities of A1111 in the img2img tab to remove or change them. A1111 even allows you to use masks created in Adobe Photoshop to more precisely mask areas of the generated image (lower left of screen), as shown below.

Replacing the background on the blue car using A1111 inpainting and a mask created in Adobe Photoshop

Retouching

Next, I opened the generated image in Adobe Photoshop for retouching and color adjustments. For the blue car, I removed the hood emblem and several anomalies on the front grill and side of the vehicle. I also dodged and burned different areas of the car, darkened the orange light in the warehouse window, and enhanced the red light on the post behind the vehicle. Lastly, I adjusted the image’s overall color, contrast, tone, saturation, and temperature.

Upscaling

Once the digital image editing was complete, I upscaled the image to the final size required for printing. A1111 offers several upscalers, including LDSR (Latent Diffusion Super Resolution), ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks or Enhanced SRGAN), and R-ESRGAN (Real-ESRGAN). Additionally, you can install other upscalers from Hugging Face, like the Stable Diffusion x4 upscaler . In my tests, I preferred the results of the R-ESRGAN 4x+ upscaler when scaling the image up to 4x. It is also the fastest of the upscalers I tested.

Using A1111, I upscaled the retouched image from 1536 x 1024 pixels to 4608 x 3072 pixels (3x) and subsequently 6144 x 4096 pixels (4x) using the R-ESRGAN 4x+ upscaler. The results were excellent and far superior to using an image editing program for scaling.

Using the R-ESRGAN 4x+ Stable Diffusion Upscaler model to upscale the original retouched image

Digital Printing

Many reputable online service providers can digitally print images on various substrates and products. For this image, I chose Mpix , a US-based photo lab. They have several print options, including traditional photographic paper, Giclée fine-art prints, canvas, metal, wood, and acrylic. Mpix does an excellent job of suggesting the recommended image resolution based on the output size. I scaled the image to the recommended size range, uploaded it to Mpix, and ordered my print products.

Ordering large-format wall art of the final upscaled image

The results from Mpix were perfect! The color was accurate, there was no loss of image detail, and the print was expertly mounted, finished, and packaged for delivery.

Final large-format archival-quality “standout” print from Mpix

Previewing the Output

Another advantage of digital imaging is the ability to quickly simulate what the final print product will look like in an environment. For example, below, I superimposed the final car image, cropped to the correct print size, on a contemporary-style living room wall image licensed from Shutterstock.com . You can quickly visualize different sizes, layouts, and image arrangements.

Simulation of digital artwork mounted on living room wall (image licensed from Shutterstock)

Conclusion

This post has provided insights into incorporating fine-tuned Stable Diffusion models into advanced image generation pipelines, editing techniques, and digital printing processes. By exploring these non-traditional Generative AI tools, content creators can produce visually compelling content.

This blog represents my viewpoints and not those of my employer, Amazon Web Services (AWS). All product names, images, logos, and brands are the property of their respective owners.

Jonathan L?w

Serie-iv?rks?tter & Foredragsholder om AI og LISTEN LOUDER ?? V?rt p? EXTRAORDINARY ?? AI-blogger p? B?rsen og Finans

10 个月

Cool post Gary ?? This is me commenting personally. Not ChatGPT ?? Are you aware of EVERYDAY? Unlike Midjourney, Dalle and other AI image-generators it’s build on legal and Public Domain data; users are fully insured, and there is unlimited access once signed up. As a startup, we’re currently feeling a bit like David vs Goliath (we don't have their billions for marketing), but don’t we need a serious and more diverse alternative to the tech-giants? ?? https://jumpstory.com/everyday-ai/ https://jumpstory.com/blog/how-we-have-built-everyday-ai/ Currently EVERYDAY can create illustrations, 3D photos, icons, drawings, sketches and more. We launch ‘text to authentic photos’ in 2 weeks from today and text to video in a few months. I’m the co-founder of JumpStory, the creators of Everyday, by the way ??

Yassine Fatihi ??

Crafting Audits, Process, Automations that Generate ?+??| FULL REMOTE Only | Founder & Tech Creative | 30+ Companies Guided

10 个月

Sounds fascinating! Can't wait to see the visually captivating content you'll create. ???

Russell Rosario

Cofounder @ Profit Leap and the 1st AI advisor for Entrepreneurs | CFO, CPA, Software Engineer

10 个月

Sounds fascinating! Can't wait to see the artwork you create. ???

Christa Rascon

10 个月

?? Unlock the potential of www.edusum.com/check-point for mastering Check Point certifications. Practice tests that set you on the path to victory! #CheckPointPotential #EdusumAdvantage

Hiro Wa

? Lead UX Designer at Medl | ?? Crafting global experiences with scalable design and GenAI

10 个月

That sounds like a fascinating exploration of the intersection between technology and art! ???

查看更多评论

要查看或添加评论，请登录

查看全部

Combining the Power of Generative AI with the Creative Expression of Digital Art and Design

Gary Stafford

Principal Solutions Architect @AWS | Data Analytics and Generative AI Specialist | Experienced Technology Leader, Consultant, CTO, COO, President | 10x AWS Certified

Integrating fine-tuned Stable Diffusion models, advanced image generation workflows, and editing apps designed for creative professionals to produce digital?art

Choosing a Stable Diffusion UI