Paper Explained Series: Instruct NeRF2NeRF - Editing 3D Scenes with Instructions
Source: https://arxiv.org/pdf/2303.12789.pdf

Paper Explained Series: Instruct NeRF2NeRF - Editing 3D Scenes with Instructions

NeRF aka Neural Radiance Field is already challenging the photogrammetry process as we know due to it's speed and high photorealism than any existing 3D scene reconstruction mechanism. Next evolution where there are already some progress is on editing the NeRF constructed 3D scene.

One day back, a new kid (Instruct NeRF2NeRF) in the block has joined this race with a very simple design principle. Don't worry, I will explain in sometime. Before that let's examine some of it's impact in industrial applications and other real world use cases.

If we take example of gaming industry, they deal with polygon meshes and one of the reason is that it can be easily deformed (sculpt, extrude, re-texture and others) and physics/lighting can easily be applied to it. This is aided by variety of propriety and open-source toolsets that industry has built overtime.

But each of the processes mentioned above including 3D reconstruction, requires enormous man hours and skillset.

Second industry is VFX where they reconstruct a 3D scene but want to edit based on 4th dimension that is time.


This is where this paper can be a cornerstone in NeRF evolution. So let's go into details now.

Claim: Paper propose Instruct-NeRF2NeRF, a method for consistent 3D editing of a NeRF scene using text-based instructions. This method can accomplish a diverse collection of local and global scene edits.

Before going into how this paper has achieved what it is claiming to achieve. Let's think for a second how you would have done it ?

Let think at very high level how NeRF works and we will work backwards to see how we can edit a NeRF scene on the fly.

Step 1: Pre-Processing data for NeRF training- It takes the Video/Images input and capture camera parameters and do the camera calibration for the input images (For Video, images can be captured via ffmpeg). Okay so these are the input for the NeRF training, but it still didn't ring the bell how edit will work? No worries, I will give you one more hint.

Step 2: We already know the progress text guided diffusion has made. So if we can edit the images during the training itself via some text guided diffusion model then 3D reconstructed scene will be based on the new text guided edits. Sounds interesting right ?


Design Principle:

No alt text provided for this image

The method gradually updates a reconstructed NeRF scene by iteratively updating the dataset images while training the NeRF:

1. An image is rendered from the scene at a training viewpoint.

2. It is edited by InstructPix2Pix given a global text instruction.

3. The training dataset image is replaced with the edited image, and

4. The NeRF continues training as usual.


Note: InstructPix2Pix - is an image conditioned diffusion model.


One think to note is it's simplistic design. But we need to be careful how we edit the image and ensure edit works seamlessly across the viewpoint images. The method accomplishes this task by iteratively updating the image content at the captured viewpoints with the help of a diffusion model, and subsequently consolidating these edits in 3D through standard NeRF training.


No alt text provided for this image
Instruction guided edited 3D reconstruction over time

The editing process results in sudden replacement of dataset images with their edited counterpart. At early iterations, these images may perform inconsistent edits (as InstructPix2Pix does not typically perform consistent edits across different viewpoints). Over time, as images are used to update the NeRF and progressively re-rendered and updated, they begin to converge on a globally consistent depiction of the edited scene.?


For more such content, please follow my LinkedIn handle.

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

1 年

Purushottam Chaudhary your venture into the Paper Explained Series for Generative AI is commendable, adding depth to the community's understanding. However, could you share your insights on the paper's potential real-world applications? How do you see it impacting industries beyond gaming and VFX? Your thoughts could spark intriguing discussions among enthusiasts.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了