When sampling is OK, and when sampling is NOT OK.
Advertisers are trying to save costs. They know budgets are getting tighter going forward and they have to do more with the money they have. Sampling is one way to save costs -- i.e. measuring only some of the impressions, not all of them. But the question is "when is sampling OK, and when is it NOT OK?"
When sampling is OK
As a scientist (yes, I have a PhD in Materials Science and Engineering from MIT), I always recommend having complete data. Why? Because when you sample, you run the risk of missing some important details in the stuff you didn't measure. This is even more important in fraud detection, because the fraud could be in the 4 in 5 impressions you DIDN'T measure (1 in 5 sample rate), or the other 9 in 10 impressions you DIDN'T measure (1 in 10 sample rate).
So how do we do this -- sampling -- practically? In FouAnalytics, we recommend always starting off "full bore" which means measuring everything (10 in 10 impressions) and not sampling at the beginning. Then if we see that the data is highly reproducible, as in the following example -- 170 billion pageviews, rock solid repeatable day after day -- then we can start sampling. Because the data is highly reproducible, the risk is lower if we start to sample.
If the data is highly variable, it's not a good idea to sample, because we could easily miss important things, for example the following. The yellow means search crawlers, the orange means declared bots, and the red means bad bots. In the second chart below, note how large the green spike is, and how short-lived it is. If we were sampling, we could have missed seeing that large bot attack entirely.
When sampling is NOT OK
Many advertisers don't know that their current legacy fraud verification vendor is sampling. Why don't advertisers/buyers know this? It's because they are being charged "full bore" even if the measurement is being sampled - 1 in 100, or worse. You don't have to believe me. Ask your current verification vendor for a report that shows you the number of impressions you were charged for AND the number of impressions they actually measured. So, so simple, right? Right, but the advertisers paying for these services never thought to ask this question.
"Ask your current verification vendor for a report that shows you the number of impressions you were charged for AND the number of impressions they actually measured."
In the case of fraud verification, if they are not measuring 99 out of 100 impressions, don't you think it's super easy for the bots and fraud to hide in the 99 out of 100 and get away with it? In other words, what was not measured obviously couldn't be marked as IVT. But not getting marked as IVT doesn't mean it wasn't IVT. Think about this for a second. It WAS bots and fraud and invalid traffic even though it didn't get marked as such. It's because the vendor didn't even measure it. So the 1% that the legacy vendors have been reporting for the last 8 years, is not all the fraud there is; it's all the fraud that they could detect. These are the numbers that TAG and ANA are citing in their press releases, which ultimately misleads advertisers into thinking the problem of fraud is low, when fraud is at its highest levels ever.
领英推荐
Finally, advertisers, agencies, and publishers are realizing that these legacy verification vendors have been severely underreporting the fraud and brand safety issues. It's not just me saying it any more. Isn't it time you asked your verification vendor to show you what they actually measured versus what they charged you for? Sampling is NOT OK when it causes you to miss most of the fraud and brand safety issues.
More case examples of screen shots from FouAnalytics: https://www.dhirubhai.net/in/augustinefou/recent-activity/newsletter/
SME- Retired (1/31/2024)
3 周Thank you, DR Fou! ??
Certified Fraud Examiner
3 周Thanks for sharing.