Improving Object Classification with Lidar
Poor data quality is the bane of all analytics. It can cripple AI, BI and data science efforts. And, of course, it can also be a tremendous time suck. Every analyst knows the 80-20 rule, you spend 80% of your time cleaning data and 20% of your time explaining why you haven’t done any real analytics.
People-measurement and flow analytics data isn’t any different. And while lidar (like tagging in digital analytics) has significantly improved overall data quality, it’s far from perfect. In my last post, I described a set of techniques for improving lidar object identification. Those techniques use what you know about how an object behaves, where it is located and how large it is to improve identification – particularly in getting rid of ghosts, fragments and reflections. In some environments, object identification is the only thing you have to do on lidar data. For a lot of physical locations, there’s only one type of moving object in the space – and that’s people. So, if you’re working in that kind of environment, congratulations, you can skip the rest of this post and go look at cat videos. But if your environment is more complicated and includes cars, bikes, carts, dogs, industrial robots, pallets, etc., then read on!
In an environment with lots of different kinds of moving objects (or even two), the lidar Perception software has to do more than identify a moving object, it has to decide what it is. Every Perception software package has a built-in set of object classifications, and your down-stream data will get those classifications. The classifications themselves are built in a variety of ways. Most perception software vendors will tell you that they use “advanced machine learning” to classify objects, but I’m here to tell you that what they mostly seem to use is a few if-then rules based on the object box-size dimensions.
Given that lidar is a 3d technology, you’d think that for the most part, size would be all you really need. After all, lidar has a huge advantage over camera.? Depth perception based on vision is difficult and processor challenged. People manage to do a pretty good job of it with two eyes set a bit apart and a lot of fancy neural processing. Trying to do depth without stereoscopy is extremely difficult and even with it, it’s easy to be fooled. But lidar, being a light-ranging technology, can build a full 3d representation of an object if multiple sensors have it in view. And even if an object is only seen by a single sensor, lidar will at least get an accurate measure of the dimensions it sees. If you know that a moving object is 14 feet long and 6 feet wide, you don’t really need to know much else to know that it’s a car. And if it is 5’7” tall and 14” wide, you pretty much know it’s a person. So, it’s not unreasonable to think that object identification in lidar should be rock solid and significantly better than camera.
But that’s not always true.
Lidar object identification works by finding clustered groups of points in the point-cloud. That clustered group of points has a complex shape, which is determined by the object, but also by the beams of the lidar and the distance of the object from the sensor (because beams spread out with distance).
Here’s a fairly low-beam density lidar image:
I’m pretty sure that’s a person there in the middle (unless this was from Sasquatch Sunset), but see those point clusters in green in a line along the lower right of the image? See how similar those are to people? That’s a problem for object identification.
Now take a look at this image:
It’s easy to see the people in the foreground, but look further back into the upper right of the image. See those objects back there in green? You’ll know that some of them on sidewalk are people and the one on the road is a car. How did you know that? Mostly by where they are and maybe a little bit by shape. But the further from the lidar the object is, the fewer beams it will place on the lidar and the more likely it is that the scene will be occluded. How large is the point cloud for those two people walking toward the back of the scene? It’s just got beams on their head and shoulders, so it will see them as perhaps 1.5’ by 1’ in size. It knows that they are higher than the roadway but doesn’t actually know what’s behind that fence/wall or what the “floor” is underneath those objects. Similarly, it has only a few beams on the car at the far upper right of the scene and it may well be occluded from getting more. That means the car may look like it’s 5’ high and perhaps 2’ wide.
And this only begins to capture the complexity. Our human eyes are great at understanding that the two people crossing in the walk are TWO people. But to a point cloud, those two people can look like one. We see object munging routinely in crowded environments especially when people are traveling together. That munging is an object identification problem that arises from the nature of the clustering algorithms typically used by Perception software, but it can contribute to object classification problems. By the time you munge two or three people together, their box-sizes can begin to resemble a bicycle or even a car. This is where better ML at the Perception layer should do the work. But if it isn’t doing the work, you’re stuck with the data and the job of cleaning it.
So how to do it?
The Problem of Dimension
The key takeaway you should have from the discussion above is that object dimensions are going to vary constantly depending on where the object is in the scene. They will vary depending on how many sensors have visibility, how many beams those sensors have, and where in the scene the object is located.
Here’s is a real, completely unedited, object stream (from a high-beam density lidar) from one our locations:
When you see how variable the object dimensions are, and how the classification changes, you’ll have a pretty good understanding of how this works. So, which one is right? Well, there’s no definitive answer to that question. In general, the first observations of an object happen on the edge of the field and are the least reliable. In addition, we know that lidar is far more inclined to miss points of the object in the cluster than over-include points that aren’t part of the object (though it does both routinely). But keep in mind that a vehicle may traverse the entire field of view in a few seconds. Here, 20 seconds have elapsed and it’s quite likely that the LAST measurement is, like the first, at the end of the field of view. Yet angles matter too. If the vehicle was head-on to the sensor and then turned at the end, the best measurements are probably those last three seconds when the length suddenly shoots up. It’s even possible that all of the classifications are correct. Check out this data stream:
This happens to be a car pulling into a gas pump. The initial unknown is the vehicle pulling in and parking. The brief spell of a person was when the car was unmoving, but the Perception software was now detecting a person next to the car and treating it as the same object. Finally, the person gets into the car and as the car pulls away, its dimensions become car-like again!
Ideally, this record will be split into 3 segments and then the stitching engine will join the first and the third and the middle one will be a person. Getting that right is non-trivial and always a bit of a guessing game.
In general, our rules of thumb for handling object dimensions are these. Ignore initial or final dimensions when they are at the limits of the lidar field of view. Take advantage of what you know about the objects and their angle to the sensors based on their position in the scene. And, finally, skew toward the dimensions that are most common, largest, and taken when you know that the object is in view of multiple sensors.
Using Area and Behavior
Handling multiple dimensions for objects to fix object classification is the first step in improving lidar object classification, but it isn’t the last. In my last post, I described how we use object behavior and location to get rid of objects that we don’t want or to add classifications (like Associate) to the base object classification. The same techniques can be used to improve object classification. Keep in mind that the Perception software is generally setup to classify objects on a frame-by-frame basis. That’s extremely limiting especially given the problems around dimensional sizes.
When you can track an object over time, you can get some big clues about what it is.
One of the most powerful object classification metrics is velocity. Here’s a snapshot from the Journey Playback tool in our platform with velocity display enabled:
When something is moving at 67 kilometers per hour, it’s probably not a person. You’d expect that the Perception software would know that, but since it’s classifying on a frame-by-frame basis it may not. In mixed environments, we use velocity in two basic ways. First, we’ll use it based on most likely rules to classify unknown objects that never display a definitive destination. Second, we’ll use it to override object classifications when the velocity is unambiguous as a categorizer.
Because we have the velocity over time, we have more than a snapshot. If we’re confident that the object isn’t munged, we can use the highest velocity recorded as a classifier. One thing to beware of here is frame gaps. When we record velocity, we want to make sure that we’re doing it on the basis of filled frames. In other words, we’ll typically calculate velocity on a per second basis. Usually, a second will have at least one frame and usually multiple frames inside it. But there are times when the Perception software does its own stitching (we hate this). It loses track of an object and then “re-acquires” it a few seconds later. That means there is a more than one-second gap between frames. If you’re not careful, that re-acquisition can yield a big velocity because it really isn’t the same object. We will typically look at frame gaps and, if we think the velocity is out of pattern, split the record. If they should be stitched, the stitching engine will handle it. But with an out-of-pattern velocity a good stitch is extremely rare.
Velocity isn’t the only cue you have about object identification. Often, where an object goes is a huge tell. If we’re measuring a gas-station, seeing a car go inside the building is a clue. I’m not saying that never happens (everything happens, even without the multiverse), but if you’re doing analytics not public safety, it’s a 99.9999999%? bet that something is wrong in the data when that happens. Often, what’s happened is that the lidar was following a car which parked, but then picked up on the person emerging from it and tracked them inside without changing the categorization. In any mixed-use environment, you’ll see this happen a lot. It also happens in reverse order. You’ll see a passenger at an airport emerge, go to a car at curbside, and then the Perception software will track the merged object under the person’s id when it pulls away.
We use area cues very much like we use velocity cues. In some cases, the area is used only to nudge unknown classifications into the most likely direction. But if the lidar classifies a person on the freeway or a car in the store, we’ll do a little more than nudge that classification.
Having said that, I’ll just reiterate that all of this nudging and forcing is optional, and we only do it when we think the circumstances warrant. If you set up a lidar system to detect things on the highway that shouldn’t be there, you can’t nudge object detections to be cars! You can still use the techniques I’ve outlined to reduce false positives, but you’re likely to use them in a slightly different way.
Ultimately, the best object detection probably lives at the level of Perception software. That’s really the only place where the true shape of the point cloud exists. And as lidars gradually grow beam density, those point clouds are becoming increasingly definitive. We're seeing new sensors emerge on the market with 200-300 beams! For now, though, if you’re using lidar data you almost certainly need to pay some attention the object classifications to make them stable and robust. This is a lot simpler than improving object identification, but no matter how much cleaning you do, there are always going to be ambiguous cases in mixed -use environments.
Still, with a little work, you can get to a level of accuracy that supports deep analytics based on object classification and keeps you out of the 80-20 rule!
VP of Engineering Solutions | Executive Leadership, Six Sigma Lean
10 个月Gary great synopsis and on point. Subcategories of Vehicle, Assets, People and Objects classifications will be important additions to perception. Building more insight
VP, Market Development and Alliances at Quanergy Solutions, Inc.
10 个月Spot on!
Insightful read on the challenges of data quality in analytics; it's a reminder that the foundation of good data science is often in the meticulous preparation of the data itself.