登录查看更多内容

Single Block / Layer FLUX LoRA Training Research Results and LoRA Network Alpha Change Impact With LoRA Network Rank Dimension

Furkan G?zükara

PhD. Computer Engineer. Produces Content For FLUX, LoRA, Fine Tuning, Stable Diffusion, SDXL, Training, DreamBooth Training, Deep Fake, Voice Cloning, Text To Speech, Text To Image, Text To Video, Generative AI, LLMs

发布日期: 2024年9月20日

+ 关注

Info

As you know I have finalized and perfected my FLUX Fine Tuning and LoRA training workflows until something new arrives
Both are exactly same, only we load LoRA config into LoRA tab of Kohya GUI and we load Fine Tuning config into Dreambooth tab
When we use Classification / Regularization images actually Fine Tuning becomes Dreambooth training as you know
However with FLUX, Classification / Regularization images do not help as I have shown previously with Grid experimentations
FLUX LoRA training configs and details : https://www.patreon.com/posts/110879657
FLUX Fine Tuning configs and details : https://www.patreon.com/posts/112099700
So what is up with Single Block FLUX LoRA training?
FLUX model is composed of by 19 double blocks and 39 single blocks
1 double block takes around 640 MB VRAM and 1 single block around 320 MB VRAM in 16-bit precision when doing a Fine Tuning training
Normally we train a LoRA on all of the blocks
However it was claimed that you can train a single block and still get good results
So I have researched this thoroughly and sharing all info in this article
Moreover, I decided to reduce LoRA Network Rank (Dimension) of my workflow and testing impact of keeping same Network Alpha or scaling it relatively

Experimentation Details and Hardware

We are going to use Kohya GUI
How to install it and use and train full tutorial here : https://youtu.be/nySGu12Y05k
Full tutorial for Cloud services here : https://youtu.be/-uhL2nW7Ddw
I have used my classical 15 images experimentation dataset
I have trained 150 epochs thus 2250 steps
All experiments are done on a single RTX A6000 48 GB GPU (almost same speed as RTX 3090)
In all experiments I have trained Clip-L as well except in Fine Tuning (you can't train it yet)
I know it doesn't have expressions but that is not the point you can see my 256 images training results with exact same workflow here : https://www.reddit.com/r/StableDiffusion/comments/1ffwvpo/tried_expressions_with_flux_lora_training_with_my/
So I research a workflow and when you use a better dataset you get even better results
I will give full links to the Figures so click them to download and see full resolution
Figure 0 is first uploaded image and so on with numbers

Research of 1-Block Training

I have used my exact same settings and trained 0-7 double blocks and 0-15 single blocks at first to determine whether block number matters a lot or not with same learning rate of my full layers LoRA training
0-7 double blocks results can be seen in Figure_0.jfif?and 0-15 single block results can be seen in Figure_1.jfif
I didn't notice very meaningful difference and also the learning rate was too low as can be seen from the figures
But still I picked single block-8 as best one to expand the research
Then I have trained 8 different learning rates on single-block 8 and determined the best learning rate as shown in Figure_2.jfif
It required more than 10 times learning rate of all blocks regular FLUX LoRA training
Then I decided to test combination of different single blocks / layers and wanted to see their impact
As can be seen in Figure_3.jfif?I have tried combination of 2-11 different layers
As the number of trained layers increased, obviously it required a new fine-tuned learning rate
Thus I decided to not move any further at the moment because single layer training will obviously yield sub-par results and i don't see much benefit of them
In all cases Full FLUX Fine Tuning > LoRA Extraction from Full FLUX Fine Tuned Model > LoRA full Layers training > reduced FLUX LoRA layers training

Research of Network Alpha Change

In my very best FLUX LoRA training workflow I use LoRA Network Rank (Dimension) as 128
The impact of is, the generated LoRA file sizes are bigger
It keeps more information but also causes more overfitting
So with some tradeoffs, this LoRA Network Rank (Dimension) can be reduced
Normally I found my workflow with 128 Network Rank (Dimension) / 128 (Network Alpha)
The Network Alpha directly scales the Learning Rate thus changing it affects the Learning Rate
We also know that training more parameters requires lesser Learning Rate already by now from above experiments and from FLUX Full Fine Tuning experiments
So when we reduce LoRA Network Rank (Dimension) what should we do to not change Learning Rate?
Here comes the Network Alpha into play
Should we scale it or keep it as it is?
Thus I have experimented LoRA Network Rank (Dimension) 16 / 16 (Network Alpha) and 16 / 128
So in 1 experiment I kept it as it is and in another experiment I relatively scaled it
The results are shared in Figure_4.jpg

Conclusions

As expected, as you train lesse parameters e.g. LoRA vs Full Fine Tuning or Single Blocks LoRA vs all Blocks LoRA, your quality get reduced
Of course you earn some extra VRAM memory reduction and also some reduced size on the disk
Moreover, lesser parameters reduces the overfitting and realism of the FLUX model, so if you are into stylized outputs like comic, it may work better
Furthermore, when you reduce LoRA Network Rank, keep original Network Alpha unless you are going to do a new Learning Rate research
Finally, very best and least overfitting is achieved with full Fine Tuning
Second best one is extracting a LoRA from Fine Tuned model if you need a LoRA
Third is doing a all layers regular LoRA training
And the worst quality is training lesser blocks / layers with LoRA
So how much VRAM and Speed single block LoRA training brings?

Image Raw Links

领英推荐

AI Hardware: CPU vs GPU vs NPU

Alex Wang 4 个月前

Behind every successful Software is a strong..…

Christof Horn 5 个月前

AI For All: The Evolution of Processing Power: CPUs…

Amaresh Shinganagutti ? 3 个月前

Figures

Generative AI

3,224 位关注者

Pablo Montero

Motion Designer, GenAI Researcher

2 个月

Hey Furkan, what do you mean in your traning with 256 images of yourself when saying this? "SUPIR Upscaling (default settings are now perfect)" Is SUPIR updated with new default values or what? Thank you!

1 次回应

Johan Fredriksson

AI & Machine Learning Developer @ AFRY

2 个月

This article... ?? I missed the link to this article when commenting on your post. This is da bomb ??????

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Single Block / Layer FLUX LoRA Training Research Results and LoRA Network Alpha Change Impact With LoRA Network Rank Dimension

Furkan G?zükara

PhD. Computer Engineer. Produces Content For FLUX, LoRA, Fine Tuning, Stable Diffusion, SDXL, Training, DreamBooth Training, Deep Fake, Voice Cloning, Text To Speech, Text To Image, Text To Video, Generative AI, LLMs

Info

Experimentation Details and Hardware

Research of 1-Block Training

Research of Network Alpha Change

Conclusions

Image Raw Links

领英推荐

Figures

Generative AI

3,224 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

CPU, GPU, TPU, NPU: A Breakdown of Processing Units in the AI Era

The Power of Hardware in Shaping Gen AI & Beyond

Choosing the Right Server for Your Computer Vision Project: Key Criteria to Consider

Accelerated Computing with C++

CPU vs. GPU: Training and Fine-Tuning Machine Learning Models

Top 8 Modern GPUs for Machine Learning

Why GPU Can Process Image Much Faster than CPU?

Advancements in AI Hardware: A Deep Dive into NVIDIA's GPU Revolution

Nvidia and the Birth of a New Computing Paradigm

Navigating the Paradigm Shift in AI Hardware: Performance vs. Efficiency

Info

Experimentation Details and Hardware

Research of 1-Block Training

Research of Network Alpha Change

Conclusions

Image Raw Links

领英推荐

Figures

Generative AI

3,224 位关注者

NVIDIA Labs developed SANA model weights and Gradio demo app published —Check out this amazing new Text to Image model by NVIDIA

2024年11月22日

Kohya brought massive improvements to FLUX LoRA (as low as 4 GB GPUs) and DreamBooth / Fine-Tuning (as low as 6 GB GPUs) training

2024年11月17日

Genmo Mochi 1 — SOTA Video Generation Model — Full Tutorial With SwarmUI — Locally Generate Amazing AI Videos for Free

2024年11月9日

Hunyuan3D-1 - SOTA Open Source Text-to-3D and Image-to-3D - 1-Click Install and use both Locally on Windows and on Cloud - RunPod and Massed Compute

2024年11月6日

Example Training Images Dataset, Trained Models, Grids and Full Training Configs, json files and more

2024年11月2日

Thoroughly experimented with Fine-Tuning / DreamBooth training of Flux-dev-de-distill, PixelWave v03, Verus Vision and base FLUX Dev model

2024年10月31日

Stable Diffusion 3.5 Large How To Use Tutorial With Best Configuration and Comparison With FLUX DEV

2024年10月26日

Huge FLUX LoRA vs Fine Tuning / DreamBooth Experiments Completed

2024年10月15日

Huge news for Kohya GUI - Now you can fully Fine Tune / DreamBooth FLUX Dev with as low as 6 GB GPUs without any quality loss compared to 48 GB GPUs

2024年10月7日

How to Set Up Python, CUDA, cuDNN, C++ Build Tools, FFMPEG & Git for AI Applications

2024年10月1日

社区洞察

其他会员也浏览了

CPU, GPU, TPU, NPU: A Breakdown of Processing Units in the AI Era

The Power of Hardware in Shaping Gen AI & Beyond

Choosing the Right Server for Your Computer Vision Project: Key Criteria to Consider

Accelerated Computing with C++

CPU vs. GPU: Training and Fine-Tuning Machine Learning Models

Top 8 Modern GPUs for Machine Learning

Why GPU Can Process Image Much Faster than CPU?

Advancements in AI Hardware: A Deep Dive into NVIDIA's GPU Revolution

Nvidia and the Birth of a New Computing Paradigm

Navigating the Paradigm Shift in AI Hardware: Performance vs. Efficiency