CoIn: Comprehensive 2D-3D Inpainting with Gaussian Splatting Guidance

Researchers propose CoIn, a novel framework for 3D scene inpainting that bridges 2D inpainting models and Gaussian Splatting through a multi-stage consistency pipeline. The method handles both object removal and insertion with arbitrary-shaped masks, achieving state-of-the-art performance by leveraging bidirectional information flow between 2D and 3D representations.

arXiv:2606.27584v1 Announce Type: new Abstract: 3D scene inpainting is essential for reconstructing areas corrupted by occlusions or limited viewpoints. While recent methods leverage Gaussian Splatting GS for efficient 3D editing, they often depend on precise multi-view segmentation masks and are inherently constrained to object removal tasks. We propose CoIn, a novel framework that bridges 2D inpainting models and 3DGS through a multi-stage consistency pipeline. Our approach first generates initial inpainted images using a diffusion model, enabling the use of arbitrary-shaped masks and diverse tasks like object insertion. We then introduce Reference Adaptive GS with Feature Attention to reconstruct a coarse 3D scene by adaptively weighing towards a reference view 2D - 3D . This 3D representation provides geometric guidance to the diffusion process via GS-based Reference Feature Warping, ensuring multi-view consistency 3D - 2D . Finally, a Texture-Enhancing Discriminator refines the 3D scene to achieve high photometric realism 2D - 3D . Experiments show that CoIn, effectively leveraging bidirectional information flow, achieves state-of-the-art performance and effectively handles both object removal and object insertion with flexible mask input.