Observance - Preface
Preface
As mentioned in the last post, we will now share what we do, what Observance is, how we implemented Transformers and other AI-based models in our processing pipeline, and the basic building blocks to understand all of this. One can follow these posts to understand the state-of-the-art in 3D reconstruction, LIDAR, and Sensor-Fusion, and create usage of Transformers. We want these articles to be an informative starting point for anyone trying to solve similar problems. So instead of going deeper into Why Observance Exist, we will look at the problems we had to solve while building it and how we solved them.
What exactly are we solving?
We are trying to perform a 3D reconstruction of a scene using techniques this is fast and highly accurate. These scenes can be very large (>1 million sq. ft) and we do not have a lot of time for data collection (less than an hour or as fast as one can walk). Once data is captured we need to convert it into a lightweight 3D model with colored texture and compare it with the actual 3D or 2d model (after automatic alignment) to identify structural deviations or defects.
And of course, this process has to be extremely cost-effective because if you have $300k you can get a 1M sq. ft area scanned by leading service providers within a month.
What is 3D reconstruction?
3D reconstruction is the process of creating a three-dimensional model of an object or scene from a set of two-dimensional images or measurements. This can be done using a variety of techniques, including photogrammetry, structure from motion, Structure from Motion, and SLAM and AI techniques like Monocular Depth Estimation. While we will predominantly focus on techniques similar to SLAM using LiDAR, I remain heavily bullish on Monocular Depth Estimation.
What is SLAM?
SLAM, or Simultaneous Localization and Mapping, is a technique used to create a map of an environment using a camera, LiDAR, inertial measurement units, or other sensors, while simultaneously tracking the location of these sensors within the map. At Inkers, we are using a sensor fusion framework combining LiDAR, Camera, and Inertial sensors to achieve robust and accurate 3D point estimation.
What is sensor fusion?
Sensor fusion is the process of combining multiple sources of information to obtain a more accurate, reliable, and comprehensive view of the world. This is often done in situations where sensors are measuring different aspects of the same phenomenon, such as in self-driving cars, where sensors like cameras, lidar, and radar are used to understand the environment around the vehicle. Let's break this down.
Let us say we are using LiDAR to capture the 3D point cloud. LiDAR would capture instantaneous 3D data extremely well (but not the color texture). But during the capture, we do not have access to the most accurate 3D location of the LiDAR (GPS in our case is ruled out as we need to scan areas where GPS location might not be accessible). The images on the left are how the overlayed scans might look, while we need the scans to align as shown in the right images below.
领英推荐
There are some techniques like Iterative Closest Point, but practically they are good for small regions. At Inkers, we capture LiDAR points at 10 fps. Assuming a walkthrough from a building takes around 15 minutes, we need to align ~9000 point clouds! Other approaches, including ICP, will heavily depend on initial correct localization.
Instead of LiDAR, one can use multiple camera images and align them to create 3D models as well, just as Matterport does.
This works amazingly well when we want to explore a site visually or capture the texture (as shown in the image below):
But there is only so much detail and accuracy one can capture using just camera-based techniques:
Models would remain incomplete and off by a large percentage on measurements:
If we just need the current location then we can also use an accelerometer or IMU. IMU works well for instantaneous location, but the errors accumulate over time and result in large deviations:
So, LiDAR provides the accurate instantaneous 3D location of points in space, the camera provides amazing texture, and IMUs provide accurate current location.
So, can we
These are the problems that we wanted to solve. But as we started to develop our solution, we realized that these are not the only problems at hand.
Well, the list is long and doesn't end here.
In the articles to come, we will cover the problems described above and the solutions we opted for in detail. These solutions would be applicable to problems related to the companies that are working in the field of interior design, building construction, video game production, filmmaking, architecture, restoration, engineering, scientific research, autonomous driving, and robotics.
Stay tuned!
Operations, Data & Strategy
2 年Amazing Stuff!
Founder @CloudWorx | Assembly Line | Digital Twins | Forbes Asia U30 | Shark Tank India
2 年"We still need a MESH" - Couldn't stress it any further!
Associate Director | Generative AI Products | AI Governance | Technologist | UCLA | IIM Kozhikode
2 年??????????
AuraML: Co-Founder & CTO: Robotics Simulation | Entrepreneur | Startup | Antler, EF | Advanced Artificial Intelligence and Machine Learning: IIIT-Hyderabad | Ex-Josh Talks, Magnitude Software | HIGH ACHIEVER.
2 年??????