3D Reconstruction using Voxel Grids
Atharv Subhekar
MS in Machine Learning | Stevens Institute of Technology | Computer Vision | Natural Language Processing | Deep Learning
In recent years, the fusion of computer vision and 3D reconstruction has revolutionized various fields, from entertainment to healthcare and beyond. Among the myriad techniques, voxel-based modeling stands out for its ability to capture intricate spatial details and generate highly accurate 3D representations.
With my teammate Prasad Naik , we implemented a Voxel based 3D reconstruction system which takes several images of a subject from different camera locations and computers a dense 3D point cloud.
These were the images we used for this task:
Along with these RGB images we also had access to the silhouettes for each image.
What is a Voxel Grid?
Voxel grid is a 3D geometric grid of values organized into layers of rows and columns. For this problem, we chose to create a grid which measured 5m in x direction, 6m in y direction and 2.5m in z direction. By setting the smaller size of the grid, we can achieve a very detailed and dense 3D point cloud.
Constructing the Visual Hull
A visual hull is a geometric entity created by shape-from-silhouette 3D construction technique. This technique assumes the foreground object in an image can be separated from the background. Under this assumption, the original image can be thresholded into a foreground/background binary image, which we call a silhouette image. The foreground mask, known as a silhouette, is a 2D projection of the corresponding 3D foreground object.
领英推荐
All the voxels are labeled as empty at first. Then we project the center of each voxel to all the images (there are total 8 images) and label as occupied only those voxels whose projection resides insides all the silhouettes. By doing this, we ensure that the camera is indeed capturing the dancer.
Once we construct the visual hull, we give it a false color to ensure our visual hull is correct.
True Coloring
We now color the model by assigning per-channel (Red, Green, Blue) median values of the pixels in our 2D images to which a surface voxel projects into. In other words, for every point in the above 3D reconstructed model, we check where that point projects into the 8 images. Next, we compute the median of the RGB values where that point is projected. After true coloring, we get our final 3D reconstructed model:
The code is accessible on my GitHub linked below: