登录查看更多内容

Fine-Tuning Stable Diffusion with Dreambooth ??

Wiro AI

发布日期: 2024年6月6日

Dreambooth is a technique that you can easily train your own model with just a few images of a subject or style. In the paper, the authors stated that,

“We present a new approach for “personalization” of text-to-image diffusion models (specializing them to users’ needs).”

In this blog, we will explore how to train Dreambooth, discuss its hyperparameters and look into how to train images using captions. Let’s dive in! ??

How to train Dreambooth?

First of all, you need to prepare your training data. If you prefer collecting your own images, you can take 4–10 pictures of the specific image. However, if you train a person’s face, it will be better to gather a few more images. Alternatively, you can try training Drembooth using the datasets available at this link.

Gathering more images is important, but even more crucial is obtaining high-quality images. The quality of the input images impacts the quality of the output images.

Secondly, you need to resize your images to 512x512 in order to provide them to the model. You can use this website to resize the images.

We will use 18 images of Elon Musk to train Dreambooth.

Before proceeding to train Dreambooth, let’s look into some hyperparameters.

pretrained_model_name_or_path: path to pretrained model (we’ll use stable diffusion 1.5 model)
instance_data_dir: a folder containing the instance images (the instance images are the ones we collected)
class_data_dir: a folder containing the class images (it contains the generated class sample images)
instance_prompt: the text prompt containing unique identifier for instance images (in our example below, it is the word “lnm”. This word must be unique and should not be in the vocabulary)
class_prompt: the text prompt that desribes the class images
with_prior_preservation: it is recommended to use, especially with face images
num_class_images: number of class images
output_dir: the output directory where to save the trained model
resolution: the resolution of all images
train_batch_size: batch size for training dataloader
validation_prompt: the text prompt that is used during validation
num_validation_images: number of images generating during validation
train_text_encoder: whether to train text encoder, it is recommended to use
max_train_steps: number of training steps
checkpointing_steps: in every x updates, save a checkpoint of the training (you can use this with “resume_from_checkpoint” hyperparameter. If the training is interrupted for any reason, you can resume from any saved checkpoints.)
learning_rate: it is better to choose lower learning rates
enable_xformers_memory_efficient_attention: whether to use xformers. If you’re experiencing memory issues, you can use this because it reduces memory usage by almost half

Also, we added two hyperparameter to the dreambooth script for validation, which are save_guidance_scale and save_infer_steps. Now, we can look at the run code for training and the results below.

As you can see, these images are not good enough.

After the initial training, we adjusted the max_train_steps to 1600. Let’s look at the results below.

Now, we have better images compared to previous ones.

领英推荐

Navigating the Challenges of Video Data Manipulation

Markovate 7 个月前

The Latest from Latent AI

Latent AI 3 周前

#398 – Team Insights Model Development

Parabol 9 个月前

How to train style images using captions?

This is a method that you can train your style images with captions. First of all, we have prepared the dataset that we will be using for training. We have decided to train a movie style which is Spider-Verse. You can take a look at the images below to get a sense of this style.

we have collected 34 images in the spiderverse style and crafted a caption for each one. Gathering more high-quality images could further improve the results.

For each training image, we have created a txt file with the same name as the image. The structure is as follows:

We typically describe the images such as: “a middle-aged man, upper body, short brown hair, brown mustache, wearing blue and purple shirt, glasses, lightings in the background”

Now let’s look into the run code below. We adjusted the train_dreambooth.py script to be able to add captions. Class images were not used. Additionally, we have chosen the unique identifier “spdrvrs”, and we set the max_train_steps to 6000.

Let’s look at the results ?

Alternatively, we trained a model using the same images but without captions. We kept the other hyperparameters same. Let’s compare the results with and without captions.

As you can see in the examples, the main problem with Dreambooth is overfitting. In both cases, we can observe that the model overfits the instance images. However, we can obtain better results when using images with captions. You can try to obtain more high-quality images to avoid overfitting.

On the other hand, we wanted to try Dreambooth LoRA SDXL using the train_dreambooth_lora_sdxl.py script to see if there was any noticeable difference. While using SDXL enhances our results, using LoRA reduces the total file size. We have revised the script to be able to add captions. Additionally, there are some hyperparameters in this script that we haven’t explained yet. Let’s explore them.

pretrained_vae_model_name_or_path: path to pretrained VAE model (we’ll use sdxl-vae-fp16-fix model) rank: the inner dimension of the LoRA matrices

If you look at the results below, we can say that we have significantly improved Spider-Verse style images. Moreover, it is worth noting that this model has no signs of overfitting ??. Training your own model with the Dreambooth LoRA SDXL would be a good choice.

Wiro AI / Machine Learning Team

Ma?lys Jusseaux, PhD

Ingénieure R&D (AI/ML/XR/3D) | Technical Artist & TD Pipeline | Chercheuse-artiste

8 个月

Hey there! This looks very interesting, I'm trying to implement it but I'm not sure your links point towards the right scripts : when launching the script with your arguments, "save_guidance_scale" and "save_infer_steps" are not recognized, and the script crashes when finding the caption files. Could you point me again towards your updated script? I'll have a look and try to debug. Thank you!

Abdullah Bezir

9 个月

Love this!

3 次回应

查看更多评论

要查看或添加评论，请登录

Wiro AI的更多文章

See all articles

Fine-Tuning Stable Diffusion with Dreambooth ??

Wiro AI

“We present a new approach for “personalization” of text-to-image diffusion models (specializing them to users’ needs).”

领英推荐

pretrained_vae_model_name_or_path: path to pretrained VAE model (we’ll use sdxl-vae-fp16-fix model) rank: the inner dimension of the LoRA matrices

Wiro AI的更多文章

社区洞察

其他会员也浏览了

ROC curve and Area Under ROC Curve in Machine Learning | Infogen Labs

Neptune dataset for understanding long videos

Viso eyes no-code for the future of computer vision and scores funding to scale

Exploring EPASWMM5 Code Variables Through the Lens of Visual AI (w/Emoji)

What's new at Amygda: September 2023

SIGMA: Synthetic Integrated General Modeling and Analysis

Top RAG Papers of the Week (November Week 1, 2024)

Data Science #34

Most common Machine Learning algorithms to know in 2022.

Learn How to Build Your Own Object Detector in a Minute!

“We present a new approach for “personalization” of text-to-image diffusion models (specializing them to users’ needs).”

领英推荐

pretrained_vae_model_name_or_path: path to pretrained VAE model (we’ll use sdxl-vae-fp16-fix model) rank: the inner dimension of the LoRA matrices

Wiro AI的更多文章

Transforming HR with Wiro AI-Powered Tools

Automatic license plate recognition (LPR)

Introduction To Retrieval Augmented Generation (RAG)

LLM Evaluation, What Is The Reality?

AI's Role In The Future of Visual Content

Creating QR Code Art with Stable Diffusion

Revolutionizing Image Generation with IP-Adapters

Combining ControlNet and LoRA with Stable Diffusion XL

LLM Inference Parameters Explained Visually

社区洞察

其他会员也浏览了

ROC curve and Area Under ROC Curve in Machine Learning | Infogen Labs

Neptune dataset for understanding long videos

Viso eyes no-code for the future of computer vision and scores funding to scale

Exploring EPASWMM5 Code Variables Through the Lens of Visual AI (w/Emoji)

What's new at Amygda: September 2023

SIGMA: Synthetic Integrated General Modeling and Analysis

Top RAG Papers of the Week (November Week 1, 2024)

Data Science #34

Most common Machine Learning algorithms to know in 2022.

Learn How to Build Your Own Object Detector in a Minute!