Track Breakage and Stitching Full-Journey Lidar & Camera Data
Raw Unstitched Lidar Data Showing Track Breakage

Track Breakage and Stitching Full-Journey Lidar & Camera Data

Track breakage is the absolute bane of people-measurement data and flow analytics. It’s by far the most common, serious, and difficult to fix data quality issue you’ll face. Breakage is an issue for both camera and lidar sensors (it’s about equally bad for both), and it impacts almost every full-journey metric.

The stress, though, in that last sentence should probably be on the words full-journey. One of the hardest things to come to grips with in people-measurement data quality is how differently data quality issues impact metrics and use-cases. If you’re tracking queue length or occupancy, fragments and ghosts can be significant problems. Track breakage? Not so much. Track breakage may have a tiny impact on queue length measurement, but it will almost never be significant. On the other hand, if you’re trying to track the time it takes for a passenger to go from the start of a queue to the other end of security, track breakage may absolutely cripple your ability to do that.

Let’s start with what track breakages is and how/why it occurs. Though lidar and camera use different techniques to identify people in a location, they are both frame-based techniques. Lidar builds a point cloud based on shooting out beams of light and measuring their return. Typically, this happens 10 times a second. Each frame represents the image from 1/10 of a second and is a snapshot of the beam returns for that 10th of a second. Camera is similar. It’s snapping frames and then analyzing each frame for person-like objects.

To provide journey tracking, the Perception software or ML on-board the camera tracks identified objects from frame to frame. It relies on the high frame rate to make that task easy. In open, unobstructed environments with a single person, this almost always works perfectly. But the more people you add into the environment and the more obstructions the environment has, the more likely it is to go wrong. Obstructions are easy to understand. Both camera and lidar are line-of-sight technologies. When they don’t have line-of-sight, they don’t work. That means if a person passes behind a tree, a bush, a pillar, a shelf or a taller person and is obstructed from view, the sensor system will lose tracking continuity. When the object reemerges, it will be given a new identifier because these technologies don’t track “people” – they don’t use biometrics or image. They just track moving objects and whenever continuity is lost, its identifier is ended.

That’s track breakage. Here’s an example of it from one of our stores (it happens to be camera sensors but that’s entirely arbitrary). I’m using our journey playback tool but instead of showing the data after we’ve cleaned it, I’ve loaded the raw data is it arrived from the sensor array:

What you’re looking at is a mall store right at open. There’s one Associate at the cash wrap. After a minute or two of time (sped up in the scene by 15x), a customer comes in. The Associate track is broken up into about four separate short tracks. Then it makes its way across the store. Goes into an Associate Door to the backroom, comes back out to the register, and remains there. The customer enters from the right (the main entrance), goes up to the top of the store, and then proceeds across the store to the cash wrap.

In the raw data, the customer track is broken up into 3 segments (813,815, 816). The Associate’s track is broken up into at least six. And keep in mind this is when the store is empty!

Pretty bad, huh? It is true that the area around the cash wrap (which is where that first 20 seconds takes place) is poorly covered, but not only do we assume that in the raw data the customer tracks will be broken many times (this was a short visit in optimal conditions), but that track breakage will also cripple some aspects of Associate identification. You can see a more extended version of track-breakage in the header image for this post. There, I turned on persistent trails and spun through a few minutes of data when the store was moderately trafficked. You can see lots of tracks starting and ending in the middle of the store. Every one of those is a product of track breakage.

Here’s the cleaned data from stitching:

Now, we have two Associate tracks. One during the entire time from before going into the backroom. One from the entire time after. And we have one shopper track.

What about that Associate break? Could we fix it? Not really. Sure, it’s probably the same Associate in this case. But it might not be. There might be 2 or 3 or 15 Associates on duty and we have no way of knowing if the Associate that emerged from the door is the same as the one who went in.

That’s both the benefit and the drawback to PII sensitive measurement. Since we aren’t capturing images or biometrics, we have no way of doing that rematch. Is it possible? Sure it is. It’s possible with biometric matching or with electronic tagging. But in the out-of-the-box measurement we do (and which is what we strongly prefer), there is no PII. So, we don’t know whether that’s the same Associate, but notice that we do know that both tracks are AN associate. We know that because the Associate used that door and was behind the cash wrap. Good journey tracking (with stitching) can make behavioral Associate identification reliable and practical. Without stitching, neither is true.

On the other hand, it may seem like the shopper journey (split into a mere three segments) is a miracle of quality. But that track splitting is devastating to almost every metric we collect. Instead of a store having 1,000 visits, it has 3,000. Instead of the average time being 7 minutes, is 2.3 minutes. Instead of the conversion rate being 40% it’s 13%. And it isn’t just metrics that are corrupted. So too are behavioral associations about who visited what – since track breakage will strong favor sections that in close proximity. Path, funnel and segmentation analytics are almost useless without stitching.

You want to see how dramatic the impact is? Here are the statistics from our stitching log for a typical day for the location where this video came from:

Initial Tracks: 3,760

Statics Rejected: 113

Total Stitches: 1,911

Fragments Rejected: 625

Other Rejections: 627

Customers Written: 388

Associates Written: 90

Stitching reduced 3,760 tracks down to about 400 shopper tracks and 90 associate tracks. That 90 associate tracks probably only represents 3-5 associates, but it reflects a new track every time they use the backroom.

It’s an order of magnitude improvement in accuracy. And, frankly, that’s about what we expect going from non-stitched to fully cleaned data. This just isn’t unusual, and it’s why people who are using the data directly from lidar and camera systems and trying to do full-journey stuff are often bitterly disappointed. Load this raw data into Tableau and you’ll have a whole lot of garbage for analysis.

This also underscores how important it is to maintain tracking on the stitching you're doing. If something goes wrong in the system, those metrics will almost certainly catch it and they provide a powerful measure of ongoing data quality.

If you’re focused on journey, you simply cannot afford to ignore or do a bad job on track breakage. I try to tell people that data quality is about achieving usability. Data may or may be good enough to use – it’s sure never perfect – and use depends on use-case. Without stitching, people-measurement data is good enough to use for door-counting or queue management, but not journey metrics. With stitching, it can be useful for a wide variety of journey use-cases as well.

The how-to of stitching is so complex that I’m not going to even try and explicate how to do it at any level of detail. Instead, I’ll provide a high-level overview of our methodology and why our system works the way it does. Your results may vary; in fact, the best approach to stitching is somewhat dependent on the exact nature of your location and tracking. We had to build a system that works well for lots of different kinds of spaces and sensors. If you’re just focused on one location, you may be able to do quite a bit less.

At base, stitching works by looking for records that don’t start or stop at the end of the coverage area. That means it’s entirely dependent on the digital maps we create. We need to know where coverage ends – usually where doors are, but it may also be just a designated cut-off zone for the sensor. The exact location of the coverage zone can be a bit tricky because of the start/stop asymmetry problem I mentioned earlier, but to simplify think of a store with a single vestibule entrance. Any record that doesn’t start in that entrance needs to be stitched. And who can it be stitched too? Any record that previously ended outside that entrance. Typically, though, we restrict the range of potential stitch candidates to records that seem possible. If a record disappeared an hour ago, it isn’t a stitch candidate for a record that just appeared (except under some special circumstances). We also restrict the candidates based on what we know about the objects. We won’t stitch people to cars or Associates to Shoppers.

So, our stitching process starts by identifying a set of records that could possibly be stitch candidates. Then it uses an ML-based decision algorithm to pick the candidate with the highest score. What factors do into the decision? Time, distance (by unobstructed path), direction, velocity and object dimensions. If no candidates are identified, then the record is just left as a partial. We may still write it, or, if it doesn’t meet broader rules for writing records, it may just be eliminated.

Why not just use ML to select the best-candidate from any previous record? Mostly we find restricting the initial set of records to plausible candidates is faster and doesn’t impact stitching accuracy.

People who have never worked with people-measurement data for journey metrics don’t understand this problem or how important it is. I regularly see RFPs from organizations asking for something like “accuracy” of a system when they are asking for full-journey metrics. If they expect a number like 98%, not only are they barking up the wrong tree (that’s a good number for door-counting), but they are misunderstanding how accuracy in journey tracking works and how variable it can be.

The more crowded your environment, the more occlusion you’ll have and the more stitching you’ll need to do. The more stitching you need to do and the more crowded the environment, the more stitches you’ll get wrong. What’s more, there is virtually no way to tell if you’ve gotten a stitch wrong.

I recently saw an RFP from a major U.S. Airport that was asking for rematch capabilities in addition to a system of full lidar coverage of a terminal. They want to put biometric cameras outside lounges and restrooms so they could track people across their entire journey even when – like the Associate example earlier – they went into an untracked area. I’m not a fan of this kind of tracking. I don’t think the information you gather is worth the cost of measurement and the loss of privacy.

That's debatable, of course, and it's ultimately a client's decision not ours, but what made the RFP seem fundamentally misguided is that it didn’t account for track-breakage inside a giant lidar coverage area of an extremely crowded environment. As if rematch at a few special points is going to solve your problems when you’ve likely lost track of that person 15 times within your lidar coverage area and don’t even know you’ve lost track! Not one question or requirement on stitching and track breakage – and a whole bunch of requirements for rematch. That’s a sign that whoever wrote the RFP doesn’t understand flow-analytics.

Capabilities like rematch do exist and they can work well. But they are expensive, complex, and invasive. Nor is an airport like a retail store where you might only have one or two spots for potential rematch. There are countless spots to lose track of people in airports. ?If you’re going to use rematch because you really, really want end-to-end journey measurement from parking to gate, you’d probably be better off dumping the lidar system and just trying to do point-to-point rematch in a lot of places. On the other hand, that wouldn’t get you all the things (like queue management, dynamic maintenance allocation, occupancy management, perimeter monitoring, etc.) that you actually get with lidar measurement without rematch.

The bottom line on this kind of flow analytics (full-journey without rematch) is that it can indeed be done and done pretty well, but you need to have a realistic appreciation for the problems in the data and how they apply to your use-cases and your analysis. And you must (absolutely must) do a good job of stitching if you are going to use full-journey reporting and analytics.

要查看或添加评论,请登录

Gary Angel的更多文章

  • Managing Operating Hours

    Managing Operating Hours

    Even the simplest decisions are better with data. Most retailers take operating hours for granted.

  • The Lidar Event Kit: Taking Crowd Management on the Road

    The Lidar Event Kit: Taking Crowd Management on the Road

    In the world of people measurement, there are two real pain points. One is sensor cost, and the other is sensor…

  • Live Event Measurement: Life on Knife's Edge

    Live Event Measurement: Life on Knife's Edge

    We just finished up a huge effort over 3 weeks at one of those massive live events that PMY specializes in. For the…

  • "Your Conversion Rate is too High"

    "Your Conversion Rate is too High"

    Revisiting Lessons on the Actionability and Meaning of KPIs You don’t often hear a consultant tell a client “Your…

  • Measuring Occupancy at Live Events

    Measuring Occupancy at Live Events

    Measuring occupancy at live events is a remarkably fruitful application of people-measurement and it's one of the very…

  • Occupancy is Money in the Bank for Live Events

    Occupancy is Money in the Bank for Live Events

    Measuring occupancy (such a simple thing!) at live events is important and valuable – with a direct and significant…

  • The Use-Case Divide that Drives Sensor Decisions

    The Use-Case Divide that Drives Sensor Decisions

    In our flow-analytics sensor technology decision-tree, the fundamental split in the flow is based on what turns out to…

    4 条评论
  • Choosing a People Measurement Sensor

    Choosing a People Measurement Sensor

    A Decision Tree for Choosing the Right Technology for Flow Analytics Opening up CCTV as a people measurement and flow…

  • Measuring Live Events and Huge Crowds

    Measuring Live Events and Huge Crowds

    It is a truism of people measurement that the more people you have in a space, the harder measurement becomes. I often…

  • Q&A on Digital Mortar and PMY

    Q&A on Digital Mortar and PMY

    Becoming part of PMY is a big deal for us. Obviously.

    2 条评论

社区洞察

其他会员也浏览了