登录查看更多内容

Structure from Motion

Manish Joshi

Architect | Supply Chain | Product Development | Ex-Blue Yonder, Ex-SAP

发布日期: 2022年10月30日

Do you know how humans perceive depth information from their environment. If your answer is by comparing images from their left and right eye, try touching things with one or your eye closed. Surprised, that you can still observe three dimensional space. Read on if you want to know how a driver less car navigate around urban landscape.

I started my cloud journey with Intelligent Product Design, a collaboration application for both internal and external (suppliers, design partners, customers etc.) organizations. The application was also capable of collaborating on 3D designs. Later on, I moved to another application that allowed multi physics simulation on these 3D designs completing the loop from design to operate. Both the applications required 3D content (typically CAD / STEP / BIM) to be built on super specialty tools.

Creating 3D models is an involved task requiring AutoCAD / Maya / Blender skills. I wanted to capture this spatial information as easy as clicking a photograph. I was primarily motivated with three use cases unrelated with each other. But hey, Isn't creativity the ability to relate previously unrelated things. My favorite application areas were -

Discrete Manufacturing: How to enable SMEs to generate 3D model from engineering drawings, photographs. This would allow them to benefit from all the cutting edge technology around generative design. Use BIM to render machine KPIs in augmented display. FEA study on the surface mesh.
Logistics: Navigating environment in co pilot mode. Extract yard / warehouse layout during motion. Enable driver less / contact less delivery.
Gaming: How about playing in real world locations. If only players could scan their neighborhood with cost effective devices. I don't mind iPhone 14 folks. But, not everyone can afford an iPhone.

Result

As stated before, we want to build a bare minimum 3D scanning application. The application would run on the smartphones used widespread.

There is heavy duty processing involved with python libraries. Although, it is possible to package all the libraries and the python code using kivy. We would make a PWA to ease application distribution. Why PWA? That's for another coffee corner discussion.

3D scene reconstruction happens on server side. You can deploy the server side on local machine, in any container or even on Jupyter notebook. The server process connects to the client using webRTC, a peer-to-peer communication protocol for data and streams. With this approach, one can deploy the client application even on edge devices.

The application has two modes. Calibration mode to calculate camera intrinsic parameters and distortion coefficients. Scanning mode for 3D scene reconstruction.

Architecture

The client application is built as a PWA (web distribution) used on mobile devices and an Arduino sketch to be deployed on Uno, Nano etc. (can be adapted for other development boards as well). The camera module used for Arduino is OV7670.

Client connects to server process using WebRTC protocol. WebRTC allows captured video to be streamed to the server in real-time. The data channel is used to send commands and exchange other context data like camera intrinsic parameters, geo coordinates etc. Firebase is used as a presence database synchronizing ICE candidates.

Structure from Motion pipeline running on python can be deployed on local machine, docker, kubernetes, or jupyter runtime. Specific aspects of the pipeline are described in below sections. Structure from Motion problem is described as a process to establish the spatial orientation of target objects from movement of one or more observers. Structure refers to the coordinates, shape and relative position of the observer whereas Motion refers to relative translation and rotation of the observer camera frustum.

Camera calibration

At its core SFM works by observing key points across set of images captured with calibrated cameras i.e. focal length and distortion coefficients across both axis are known in advance. When this is not the case the camera(s) must be calibrated.

领英推荐

Exploring 3D CAD, Graphics, and Virtual Reality

World of Electrical 4 周前

Autodesk 3ds Max 2024: A Complete Guide to Tools and…

PIYUSH MAURYA 4 个月前

BeeGraphy Tutorial 4: Beegraphy’s Configurator Mode…

BeeGraphy 1 个月前

Find out radial and tangential distortion caused by camera
Find the intrinsic and extrinsic properties of a camera
Undistort images based on these properties

We must capture images of a well defined pattern (e.g. a chess board). Find some specific points of which we already know the relative positions (e.g. square corners in the chess board). Since, we know the coordinates of these points in real world space and the coordinates in the image, we can solve for the distortion coefficients. For accuracy, we should have at least 10 images.

import cv2 as cv

...

# captured_images from webRTC track
for image in captured_images:

? ? # convert to gray scale
 ? ?gray = cv.cvtColor(image, cv2.COLOR_RGB2GRAY)

? ? # find chessboard corners
? ? found, corners = cv.findChessboardCornersSB(gray, chessboard_size)

? ? if found == True:

? ? ? ? # define criteria for subpixel accuracy
? ? ? ? criteria = (cv.TERM_CRITERIA_EPS +
? ? ? ? ? ? ? ? ? ? cv.TERM_CRITERIA_MAX_ITER, 30, 0.001)
? ? ? ? 
? ? ? ? # refine corner location (to subpixel accuracy) based on criteria.
? ? ? ? corners = cv.cornerSubPix(gray, corners, (5, 5), (-1, -1), criteria)

? ? ? ? # collect objpoints (points in real world space) and imgpoints
? ? ? ? ...

? ? # calibrate camera
? ? ret, K, dist, rvecs, tvecs = cv.calibrateCamera(objpoints, imgpoints,
                                                gray.shape[::-1], None, None)

To undistort the image we can use the getOptimalNewCameraMatrix and undistort method from OpenCV

Scene reconstruction

As described earlier SFM is the process of estimating the 3-D structure of a scene from a set of 2-D images. SFM problem can be solved in many different ways. How you approach the problem depends on whether you use single or multiple cameras and if the images or ordered. In our setup we are using single camera that moves in a stationary scene. That means images have same distortion and are ordered. Our solution approach would have have been computationally less demanding if we were in a stereo (two cameras with known translation and no rotation between them) setup.

The process works on the principles of triangulation. Triangulation (interesting explanation here) is the process of determining the location of a point by forming triangles to the point from known points. Easy as it sound, we need to observe these same points (point correspondences between images) between two images. We can find corresponding points either by matching features or tracking points from image 1 to image 2. We use these points to recover relative pose of camera capturing second image w.r.t camera position for first image.

Theoretically, we can now apply triangulation on these key points and obtain a sparse point cloud. However, in reality this sparse point cloud wouldn't allow us estimating the geometry of the scene with accuracy. So the approach is to apply triangulation on every pixel of the captured images. Comparing every pixel on first image with every pixel on the second image is computationally a very intensive operation. If only we started with the stereo setup, we would have required to compare points horizontally along epipolar lines. Lets convert our monocular vision problem to a stereo vision problem.

The code below describes the key steps in the process -



import cv2 as cv
import numpy as np
...
# assuming K is the camera intrinsic matrix obtained from previous section
...
# compute image correspondences using feature matching
sift = cv2.SIFT_create()
bf = cv2.BFMatcher()


# img1 and img2 captured from webRTC track
kp1, des1 = sift.detectAndCompute(img1, None)
kp2, des2 = sift.detectAndCompute(img2, None)


# collect matched key points
matches = bf.knnMatch(des1, des2, k=2)
good = []
pts1 = []
pts2 = []
for m, n in matches:
? ? if m.distance < 0.7*n.distance:
? ? ? ? good.append([m])
? ? ? ? pts1.append(kp1[m.queryIdx].pt)
? ? ? ? pts2.append(kp2[m.trainIdx].pt)
pts1 = np.array(pts1)
pts2 = np.array(pts2)


# compute fundamental matrix
F, mask = cv2.findFundamentalMat(pts1,pts2,cv2.FM_RANSAC)


# convert monocular to stereo
h1, w1 = img1.shape
h2, w2 = img2.shape
ret, H1, H2 = cv2.stereoRectifyUncalibrated(pts1, pts2, F, imgSize=(w1, h1))
img1_rectified = cv2.warpPerspective(img1, H1, (w1, h1))
img2_rectified = cv2.warpPerspective(img2, H2, (w2, h2))


# compute disparity map
stereo = cv2.StereoSGBM_create(minDisparity = min_disp,
? ? numDisparities = num_disp,
? ? blockSize = 16,
? ? P1 = 8*3*window_size**2,
? ? P2 = 32*3*window_size**2,
? ? disp12MaxDiff = 1,
? ? uniquenessRatio = 10,
? ? speckleWindowSize = 100,
? ? speckleRange = 32
)
disparity_SGBM = stereo.compute(img1_rectified, img2_rectified)

Output disparity map below (brighter pixels are nearer to camera) -

Obtained from below pair of images -

Finally each valid pixel from disparity map can be added to the dense point cloud using reprojectImageTo3D method from OpenCV.

Summary

Structure from Motion technique represents a non-invasive, highly flexible and low-cost methodology to reconstruct 3D structures where access to other ranging methods is restricted. Its application range from geosciences, building information management, engineering and construction, logistics, historical preservation, gaming, manufacturing and medical diagnosis.

Bhuvnesh Jain

Helping Technology & Engineering Companies and Tech Enabled Businesses

2 年

Interesting reading

1 次回应

查看更多评论

要查看或添加评论，请登录

Manish Joshi的更多文章

Open API Connect: Rise of Community-Driven Tools

2023年9月23日

Open API Connect: Rise of Community-Driven Tools

In the world of API development and testing, Postman has long been a trusted companion for developers, offering a suite…
Web applications on steroids

2022年11月19日

Web applications on steroids

Are you a developer who wants to build high performance, high engagement web applications. Do you want your users…

2 条评论
State synchronization and exploratory analytics

2022年9月26日

State synchronization and exploratory analytics

Adobe will acquire Figma. Like many of you, I frantically searched "Free figma alternatives" and later reluctantly "How…
Designing a perfect IoT subsystem

2020年12月26日

Designing a perfect IoT subsystem

IoT promotes a heightened level of awareness about our world, ‘Reach out and touch somebody’ is becoming ‘reach out and…

9 条评论
Building a Tool Platform for your cloud applications

2020年9月20日

Building a Tool Platform for your cloud applications

Customer council 3 years back..

1 条评论
Top Trends Transforming the Global Logistics Market in 2020

2020年9月19日

Top Trends Transforming the Global Logistics Market in 2020

Digital Transformation The use of digitization in the logistics industry is expected to bring about significant…

1 条评论
Tool marketplace for an Enterprise Collaboration Platform

2020年1月16日

Tool marketplace for an Enterprise Collaboration Platform

A collaboration is a digital workplace where multiple parties come together to achieve a goal. Quite often the goal is…

1 条评论
Process control in Enterprise Collaboration Systems

2020年1月13日

Process control in Enterprise Collaboration Systems

Enterprise collaboration enables employees in an organization to share information with one another and work together…

See all articles

Structure from Motion

Manish Joshi

Architect | Supply Chain | Product Development | Ex-Blue Yonder, Ex-SAP

Result

Architecture

Camera calibration

领英推荐

Scene reconstruction

Summary

Manish Joshi的更多文章

社区洞察

其他会员也浏览了

Unlock Your Product with the Power of 3D Visualization

?? Generative Design

BeeGraphy Tutorial 6 – Mastering Attractor Logic – Combining Rotation, Scaling, and Height Variations

3D Modelling Software Massive Market Opportunity Opening Up | Autodesk, Dassault Systèmes, Trimble

The Importance of 3D Models in the Manufacturing Industry: As-Built Models and Renovation

3D Modelling Software Market Massive Market Opportunity Opening Up | Profiled Autodesk, Dassault Systèmes, Trimble

3D Mapping and 3D Modeling Market May See a Big Move | SAAB, Airbus, Topcon

Transforming Ideas into Reality through 3D Modeling

Master the Powerful 3D Diagrams for Technical Expert Product

3D Modeling Success: The Ultimate Guide for Manufacturers

Result

Architecture

Camera calibration

领英推荐

Scene reconstruction

Summary

Manish Joshi的更多文章

Open API Connect: Rise of Community-Driven Tools

Web applications on steroids

State synchronization and exploratory analytics

Designing a perfect IoT subsystem

Building a Tool Platform for your cloud applications

Top Trends Transforming the Global Logistics Market in 2020

Tool marketplace for an Enterprise Collaboration Platform

Process control in Enterprise Collaboration Systems

社区洞察

其他会员也浏览了

Unlock Your Product with the Power of 3D Visualization

?? Generative Design

BeeGraphy Tutorial 6 – Mastering Attractor Logic – Combining Rotation, Scaling, and Height Variations

3D Modelling Software Massive Market Opportunity Opening Up | Autodesk, Dassault Systèmes, Trimble

The Importance of 3D Models in the Manufacturing Industry: As-Built Models and Renovation

3D Modelling Software Market Massive Market Opportunity Opening Up | Profiled Autodesk, Dassault Systèmes, Trimble

3D Mapping and 3D Modeling Market May See a Big Move | SAAB, Airbus, Topcon

Transforming Ideas into Reality through 3D Modeling

Master the Powerful 3D Diagrams for Technical Expert Product

3D Modeling Success: The Ultimate Guide for Manufacturers