InstantDrag: Improving Interactivity in Drag-based Image Editing
Today's paper introduces InstantDrag, a new approach for interactive drag-based image editing. The method enables users to make realistic edits to images by simply dragging points, without needing text prompts or masks. InstantDrag achieves much faster editing speeds compared to previous methods while maintaining high-quality results.
Method Overview
InstantDrag works by decomposing the drag-editing task into two main components: motion generation and motion-conditioned image generation.
For motion generation, the authors use a GAN-based network called FlowGen. This network takes the input image and sparse drag instructions from the user, and generates a dense optical flow field representing the desired motion. FlowGen is trained on video datasets to learn plausible motion patterns.
For motion-conditioned image generation, they use a diffusion model called FlowDiffusion. This model takes the original image and the generated optical flow as inputs, and produces the edited image. FlowDiffusion is trained to generate realistic images that follow the motion specified by the optical flow.
A key contribution is training these models on video datasets, which provide natural motion patterns. They carefully process the video data to create training pairs that enforce background consistency and realistic object motion.
领英推荐
The system does not require any optimization or inversion at test time. This allows it to preserve high-frequency details in the input image and achieve very fast editing speeds of about 1 second.
Results
InstantDrag achieves editing speeds up to 75 times faster than previous methods while using up to 5 times less GPU memory. In human evaluations, it outperformed other methods in terms of instruction-following, identity preservation, and overall preference. The method works well on faces as well as general scenes, and shows good generalization to domains like cartoons and drawings despite being trained only on real videos.
Conclusion
InstantDrag introduces a fast, interactive drag-based image editing method that produces high-quality results without requiring text prompts or masks. By leveraging video data and carefully designed networks, it achieves state-of-the-art performance in terms of speed, quality, and usability. For more information please consult the?full paper.
Congrats to the authors for their work!
Shin, Joonghyuk, et al. "InstantDrag: Improving Interactivity in Drag-based Image Editing." arXiv preprint arXiv:2409.08857 (2024).