Observance - Preface

Observance - Preface

Preface

As mentioned in the last post, we will now share what we do, what Observance is, how we implemented Transformers and other AI-based models in our processing pipeline, and the basic building blocks to understand all of this. One can follow these posts to understand the state-of-the-art in 3D reconstruction, LIDAR, and Sensor-Fusion, and create usage of Transformers. We want these articles to be an informative starting point for anyone trying to solve similar problems. So instead of going deeper into Why Observance Exist, we will look at the problems we had to solve while building it and how we solved them.


What exactly are we solving?

We are trying to perform a 3D reconstruction of a scene using techniques this is fast and highly accurate. These scenes can be very large (>1 million sq. ft) and we do not have a lot of time for data collection (less than an hour or as fast as one can walk). Once data is captured we need to convert it into a lightweight 3D model with colored texture and compare it with the actual 3D or 2d model (after automatic alignment) to identify structural deviations or defects.

And of course, this process has to be extremely cost-effective because if you have $300k you can get a 1M sq. ft area scanned by leading service providers within a month.


What is 3D reconstruction?

3D reconstruction is the process of creating a three-dimensional model of an object or scene from a set of two-dimensional images or measurements. This can be done using a variety of techniques, including photogrammetry, structure from motion, Structure from Motion, and SLAM and AI techniques like Monocular Depth Estimation. While we will predominantly focus on techniques similar to SLAM using LiDAR, I remain heavily bullish on Monocular Depth Estimation.

No alt text provided for this image
Ref: Boosting MDE Models to High-Resolution via Context-Adaptive Multi-Resolution Merging by S. Mahdi H. Miangoleh et al.


What is SLAM?

SLAM, or Simultaneous Localization and Mapping, is a technique used to create a map of an environment using a camera, LiDAR, inertial measurement units, or other sensors, while simultaneously tracking the location of these sensors within the map. At Inkers, we are using a sensor fusion framework combining LiDAR, Camera, and Inertial sensors to achieve robust and accurate 3D point estimation.


What is sensor fusion?

Sensor fusion is the process of combining multiple sources of information to obtain a more accurate, reliable, and comprehensive view of the world. This is often done in situations where sensors are measuring different aspects of the same phenomenon, such as in self-driving cars, where sensors like cameras, lidar, and radar are used to understand the environment around the vehicle. Let's break this down.

Let us say we are using LiDAR to capture the 3D point cloud. LiDAR would capture instantaneous 3D data extremely well (but not the color texture). But during the capture, we do not have access to the most accurate 3D location of the LiDAR (GPS in our case is ruled out as we need to scan areas where GPS location might not be accessible). The images on the left are how the overlayed scans might look, while we need the scans to align as shown in the right images below.

No alt text provided for this image
Point clouds from two datasets—(top) Gazebo with an overlap of 0.9 and (bottom) ETH with an overlap of 0.59. Reading and reference point clouds (left) prior registration and (right) aligned according to ground truth. The reference is displayed in blue, and the reading is in orange tones.

There are some techniques like Iterative Closest Point, but practically they are good for small regions. At Inkers, we capture LiDAR points at 10 fps. Assuming a walkthrough from a building takes around 15 minutes, we need to align ~9000 point clouds! Other approaches, including ICP, will heavily depend on initial correct localization.

Instead of LiDAR, one can use multiple camera images and align them to create 3D models as well, just as Matterport does.

No alt text provided for this image
SRC: https://matterport.com/discover/space/greenvale-elementary-main-level

This works amazingly well when we want to explore a site visually or capture the texture (as shown in the image below):

No alt text provided for this image
SRC: https://matterport.com/discover/space/greenvale-elementary-main-level

But there is only so much detail and accuracy one can capture using just camera-based techniques:

No alt text provided for this image
SRC: https://matterport.com/discover/space/greenvale-elementary-main-level

Models would remain incomplete and off by a large percentage on measurements:

No alt text provided for this image
SRC: https://matterport.com/discover/space/greenvale-elementary-main-level

If we just need the current location then we can also use an accelerometer or IMU. IMU works well for instantaneous location, but the errors accumulate over time and result in large deviations:

So, LiDAR provides the accurate instantaneous 3D location of points in space, the camera provides amazing texture, and IMUs provide accurate current location.

So, can we

  • combine LiDAR and a Camera to get a colored point cloud
  • combined LiDAR, Camera, and IMU to get an accurate location
  • come up with an algorithm to track the exact location for 100s of meters with precision

These are the problems that we wanted to solve. But as we started to develop our solution, we realized that these are not the only problems at hand.

  • It is extremely hard to sync a LiDAR frame with a Camera frame (remember, you need to keep the cost low, because there are PTP-based camera solutions that can make this problem easier)
  • calibrating a LiDAR with a Camera is a very tedious process, especially when both have their own aberration
  • LiDARs and Cameras might have limited FoVs, and if not designed well, one might capture more data than the other. LiDARs also do not work well with reflection (image water on the floor at a construction site or just after the rains)
  • LiDARs do not work well for short distances, so how will we capture lift lobbies, small rooms, and small passages?
  • Cameras wouldn't work at all in dark :|
  • The amount of data generated is extraordinary (a 100000 sq. ft scan can take up to 1TB of raw storage!) and storing it in a small amount of time has hardware challenges
  • We are not throwing a 1TB file toward our clients, we need heavy compression to store this data for the future
  • We still need a MESH and not a point cloud! Mesh needs to have a low polygon count, but still be good enough to capture HVAC, Plumbing, and other objects that we find at a construction site.
  • Point Cloud to Mesh conversion cannot introduce any approximation error
  • We need to segment the generated 3D Mesh into walls, columns, beams, pipes, HVAC components, floor, etc.
  • We need to recognize concrete and other defects (in the images) and then show them on the 3D model.
  • We also need to know the temperature of every point that we will capture, which means we need to integrate a thermal camera as well!

Well, the list is long and doesn't end here.

In the articles to come, we will cover the problems described above and the solutions we opted for in detail. These solutions would be applicable to problems related to the companies that are working in the field of interior design, building construction, video game production, filmmaking, architecture, restoration, engineering, scientific research, autonomous driving, and robotics.

Stay tuned!

Maqsood Ali

Operations, Data & Strategy

2 年

Amazing Stuff!

回复
Yuvraj Tomar

Founder @CloudWorx | Assembly Line | Digital Twins | Forbes Asia U30 | Shark Tank India

2 年

"We still need a MESH" - Couldn't stress it any further!

Chaitali Deb Purkayastha

Associate Director | Generative AI Products | AI Governance | Technologist | UCLA | IIM Kozhikode

2 年

??????????

回复
Arjun Gupta

AuraML: Co-Founder & CTO: Robotics Simulation | Entrepreneur | Startup | Antler, EF | Advanced Artificial Intelligence and Machine Learning: IIIT-Hyderabad | Ex-Josh Talks, Magnitude Software | HIGH ACHIEVER.

2 年

??????

回复

要查看或添加评论,请登录

Rohan Shravan的更多文章

  • EIP4 Starts now!

    EIP4 Starts now!

    EIP 4 Enrollment Starts now! Enroll at https://www.tiny.

    4 条评论
  • Are Indian smart cities really smart?

    Are Indian smart cities really smart?

    Imagine an India with citizen friendly self-sustainable urban settlements where traffic is managed through AI-driven…

    5 条评论
  • How Artificial Intelligence will impact India

    How Artificial Intelligence will impact India

    Just as he was parking his car inside his house, a few people surrounded and shot him dead. It was all dark, and…

    15 条评论

社区洞察

其他会员也浏览了