OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Credit: https://arxiv.org/pdf/2503.08677

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

Today's paper introduces OmniPaint, a unified framework for object-oriented image editing that reconceptualizes object removal and insertion as interdependent processes rather than isolated tasks. The method leverages a pre-trained diffusion model and a progressive training pipeline to achieve high-fidelity object removal and insertion while preserving scene geometry and intrinsic properties like shadows and reflections.

Method Overview

OmniPaint builds upon a pre-trained diffusion model (FLUX) and introduces a novel training pipeline that treats object removal and insertion as complementary inverse problems. The framework takes an image with a binary mask indicating the region to be edited and operates on the masked input to either remove an object or insert a new one.

For object removal, the model suppresses semantic traces within the masked region while ensuring smooth boundary transitions and preventing unintended artifacts or hallucinations. For object insertion, it integrates a new object while maintaining global coherence and context-aware realism, including physical effects like shadows and reflections.

The training pipeline consists of three phases. First, an inpainting pretext training phase initializes the model with basic inpainting abilities. Second, a paired warmup phase uses 3,000 real-world paired samples to train the model for effect-aware object removal and insertion. Finally, a CycleFlow unpaired post-training phase leverages large-scale unpaired data to enhance object insertion capabilities.

The training process enforces cycle consistency between removal and insertion. This allows the model to learn from unpaired data by ensuring that reinserting a removed object approximately restores its original representation. The model uses two separate sets of parameters for object removal and insertion, which can be switched during inference via task-specific embeddings.

Another significant contribution is the Context-Aware Feature Deviation (CFD) score, a novel metric for evaluating object removal quality. CFD consists of two components: a hallucination penalty that detects unwanted object-like structures in the removed region, and a context coherence term that evaluates how well the inpainted region blends with the surrounding background.

Results

OmniPaint demonstrates superior performance in both object removal and insertion tasks compared to existing methods. For object removal, it achieves the lowest FID, CMMD, LPIPS, and CFD scores while maintaining high PSNR, SSIM, and ReMOVE scores across multiple datasets. Qualitative results show that OmniPaint successfully removes objects and their associated effects like reflections and shadows, which other methods struggle with.

For object insertion, OmniPaint outperforms all baselines in object identity preservation metrics (CLIP-I, DINOv2, CUTE, and DreamSim) and overall image quality metrics (MUSIQ and MANIQA). Visual comparisons reveal that OmniPaint generates inserted objects with more accurate shape, texture, and lighting consistency while preserving fine details and ensuring natural alignment with scene geometry and illumination.

Conclusion

OmniPaint presents a unified approach to object-oriented image editing by reconceptualizing object removal and insertion as interdependent tasks. Through its progressive training pipeline and CycleFlow mechanism, it achieves precise foreground elimination and seamless object integration while preserving scene geometry and intrinsic properties. For more information please consult the full paper.

Congrats to the authors for their work!

Yu, Yongsheng, et al. "OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting." arXiv preprint arXiv:2503.08677 (2025).

要查看或添加评论,请登录

Vlad Bogolin的更多文章

社区洞察