ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Credit: https://arxiv.org/pdf/2404.07987.pdf

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Today’s paper proposes ControlNet++, a new approach to improve the controllability of text-to-image diffusion models when using image-based conditional controls (for example segmentation masks or depth maps). Existing methods struggle to generate images that accurately align with the input conditional controls.

Method Overview

The core idea is to explicitly optimize the pixel-level cycle consistency between the input conditional control and the corresponding condition extracted from the generated image using pre-trained discriminative models. For example, if the input is a segmentation mask, a pre-trained segmentation model extracts the segmentation from the generated image. The cycle consistency loss minimizes the difference between the input mask and extracted mask.

Directly optimizing this consistency loss by sampling images from noise is very computationally expensive, requiring storing gradients for all sampling steps. ControlNet++ introduces an efficient reward strategy - it deliberately disturbs the input images by adding noise, then uses the single-step denoised images for reward fine-tuning This avoids the costly multi-step sampling process.

The total loss is a combination of the standard diffusion training loss and the cycle consistency reward loss. During reward fine-tuning, only the ControlNet module is updated while keeping the pre-trained diffusion model and discriminators frozen.

Results

Extensive experiments across various conditional controls like segmentation masks, edges, and depth maps show ControlNet++ significantly improves controllability compared to previous state-of-the-art methods, while maintaining good image quality.

Conclusion

ControlNet++ introduces a novel cycle consistency approach using discriminative reward models to explicitly optimize controllability. It demonstrates promising results in improving controllable text-to-image generation. For more information please consult the full paper or the project page.

Code: https://github.com/liming-ai/ControlNet_Plus_Plus

Congrats to the authors for their work!

Li, Ming, et al. "ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback." ArXiv, 11 Apr. 2023, arxiv.org/abs/2404.07987

要查看或添加评论,请登录

Vlad Bogolin的更多文章

社区洞察

其他会员也浏览了