Paper Review: FreeU: Free Lunch in Diffusion U-Net
Andrey Lukyanenko
Senior Data Scientist @ Careem. Kaggle Competition Master, Notebooks Top-1.
In this paper, the authors explore the potential of diffusion U-Net for improved generation quality. While the U-Net's main structure aids in denoising, its skip connections add high-frequency features, sometimes overshadowing the main backbone's semantics. Based on this understanding, the authors introduce "FreeU", a method that enhances generation quality without requiring extra training. This approach strategically balances the influence of skip connections and the backbone of the U-Net. When implemented into existing diffusion models like Stable Diffusion, DreamBooth, ModelScope, Rerender, and ReVersion, FreeU improves the generation quality by merely adjusting two scaling factors during the inference phase.
Methodology
Diffusion models like Denoising Diffusion Probabilistic Models are fundamental for data modeling and involve two key processes: diffusion and denoising:
How does diffusion U-Net perform denoising?
The researchers observed the disparities between low-frequency and high-frequency components in the denoising process, specifically focusing on the U-Net architecture’s contributions. The U-Net architecture includes a main backbone network consisting of an encoder and a decoder and skip connections that facilitate information transfer between corresponding layers.
The backbone of U-Net:
领英推荐
Skip Connections of U-Net:
Free lunch in diffusion U-Net
FreeU increases the strength of the backbone feature map using a special scaling factor. However, this increase is applied only to half of the channels to avoid making the resulting images too smooth. This careful approach helps balance reducing noise and keeping texture details.
At the same time, the skip-feature feature map is adjusted to reduce low-frequency components mainly. This adjustment is done in the Fourier domain and helps counteract the excessive smoothness from the increased denoising. The Fourier mask plays a crucial role in applying the frequency-dependent scaling factor, and then, the adjusted secondary feature map is combined with the modified main feature map for the next layers in the U-Net structure.
What’s noteworthy about the FreeU method is its practicality and flexibility. It requires minimal changes and can be easily added with a few lines of code, avoiding the need for specific training or adjustments. It allows on-the-fly adjustments to the architecture’s settings during the inference phase, offering more flexibility in reducing noise without adding extra computational load.
Additionally, FreeU’s ability to work well with existing diffusion models stands out, enhancing their effectiveness. It does this by using the unique strengths of both the main and secondary connections in the U-Net architecture, aiming to provide better noise reduction and higher-quality image generation, all while staying practical and adaptable.
Experiments
Stable Diffusion (for text-to-image) and ModelScope (for text-to-video) were considerably improved by integrating FreeU, which was confirmed by a quantitative study with 35 participants.
Downstream tasks:
Chief Data Officer at Top
1 年Very good review, thanks for sharing.