Big data, bad data; right analytics, right actions

Dr. Augustine Fou

FouAnalytics - "see Fou yourself" with better analytics

发布日期: 2022年11月15日

Bigger is better, right? But bigger data is not necessarily better. In most cases, it's actually worse, because it's not being used correctly or interpreted correctly. Also, users of the data may not know the accuracy of the data or if it's been tampered with, etc. What the hell am I getting at? Over the years, many advertisers that I have helped had been advised to get log-level data, as a way to solve ad fraud. That's a nice idea in theory, but I'm here to tell you that's not necessary, why, and what to do instead.

Big data could simply be bad data

Log-level data is big data. Truly enormous data. Uber came to me in 2016 to ask for help reviewing terabytes of data they already collected to see if I could help them identify the fraud. I turned them down instantly, because there was no way to know if the data was accurate or if it had been tampered with. Years later, I was proven right, when court documents from one of the fraud cases that Uber won showed the ad tech vendor had fabricated the "log-level" data in its entirety -- "let's spin up more BS to Uber." Some vendors didn't even run any ads; they just took Uber's money and sent "transparency reports" that were entirely made up out of thin air -- falsely claiming that ads ran on mainstream sites and mobile apps.

So, having lots of data is entirely meaningless unless you had a way to know whether the data is real or not, let alone accurate or not. Having a ton of data presents many other challenges that most advertisers are not equipped to handle in the first place. How do you transport and store terabytes of data? How do you clean and standardize the data so even the most basic of charts can be made from it? What data should be used to generate insights that are pertinent to the business outcomes of the digital ad campaigns, or even to just optimize the campaigns themselves? All of the above require not only specialized tech to handle, it requires specialized people to handle -- data scientists and analytics team members.

Sadly, over the last 20 years, I've witnessed many gaps and insufficiencies. For example, the data scientists of a now defunct fraud detection firm rightly identified 100s of billions of fraudulent bid requests that appeared to come from sports domains like nfl.com, mlb.com, espn.com, dallascowboys.com, etc. But because they didn't understand how ad tech worked, they published a press release incorrectly claiming they caught a giant botnet they dubbed "Sports Bot." That botnet didn't exist, and major sites like espn and mlb were not overrun with bots. In fact there were no bots visiting those sites at all. The enormous quantities of faked bid requests were generated by python scripts on servers; no bots were needed to visit those websites. As a result of the inaccurate press release, those publishers whose sites were named got frantic calls from advertisers and agencies asking about their "exposure" to Sports Bot. This was but one example of analytics teams looking at the big data correctly, but not understanding the tech or digital advertising sufficiently -- i.e. a "gap."

Big data is not accurate or actionable

Assuming you've stuck with me thus far, let's make this concrete with actual examples from ad tech and ad fraud. Take for example the two fraud vendors' reports below. Vendor A's spreadsheet shows the four largest buckets of ad impressions are labeled "mobile in-app" where the app is not identified. Vendor B's report shows two of the largest rows marked as [tail aggregate] where the sites are not listed. What do you do with this data? Which fraudulent sites or apps do you add to your block list if they were not disclosed in your fraud vendors' reports? That's just how they like it; so you keep buying their fraud detection services, but don't have a way to optimize your campaigns by blocking bad sites and apps.

In another "makes no sense whatsoever" example, Integral Ad Science sends tags sheets with tags for every single ad creative. An advertiser that has 500 different ad creatives must have their media agency painstakingly add an IAS tag to each of the 500 creatives instead of one tag at the campaign level. Insanity, because bots don't care about the creative message in your ad; there won't be different levels of bot activity per ad creative, and even if there were, what does that mean and what action would you take if creative 1 had 0.9% fraud versus creative 2 which comes in at 0.6% fraud? Utterly useless work. But media agencies love it because it generates more billable hours, and IAS loves it because it creates the appearance of more data, but which is utterly useless and not actionable or the client.

When you buy programmatic ads, you pay for targeting, right? Most advertisers believe that programmatic ads can be targeted down to the individual cookie level -- in theory, the right ad to the right person at the right time. But what if I told you and showed you that the data is not accurate or entirely missing most of the time? Academic studies over the years have shown that basic targeting of 1 parameter -- gender -- is less accurate than random; and 2 parameters -- gender + age -- is only accurate 12% of the time (1 in 10 are accurate). The excel table to the right shows that about half of the bid requests in the bid firehose have "unknown" gender and age. If these 2 most basic of targeting parameters is half unknown, what are the chances that the hundreds of other targeting parameters and audience segments you are paying extra for are accurate? Right, just like above, these are made up out of thin air, to separate you from your money as efficiently as possible. You're not targeting the right audiences, let alone the right individual cookie. You may not even be targeting people (because bots easily pretend to be any lucrative audience segment that advertisers want to pay more for).

Right analytics leads to right actions

Let me bring this home. If bigger data, like log-level data, doesn't help and is not actionable, what should advertisers do instead? Let me show you what I do with a toolset that was originally built for my own workflow -- FouAnalytics. In 2012, when I started building FouAnalytics, I did so because I didn't trust anyone else's tech or anyone else's data. FouAnalytics was built with data accuracy at the top of mind, and anti-tampering features. You can read more about it here: Cybersecurity measures built into FouAnalytics. I opened up the platform for others to use in 2020, so they can also look at the analytics and take specific actions.

For example, site owners that already use Google Analytics or Adobe Analytics, can add FouAnalytics tags to the site to troubleshoot what GA and Adobe cannot resolve for them. In GA, you may see large spikes in quantity of traffic, but you can't see where it's coming from or what is causing it. In the FouAnalytics chart below, we color-code the spikes in dark red (bad bots) and provide supporting data so you can understand why it was marked as a bot. Note the same fingerprint (unique device+browser combo) is repeatedly hitting the page, the platform is "Linux x86_64" (server operating system, not Windows, Mac, iOS, or Android), and the user agent shows Chrome/41 (more than 60 versions out of date). Once you understand why we marked it as a bot, you can take action like block the bot or subtract the traffic from analytics to make your analytics more accurate. For more examples, see How Site-Owners Use FouAnalytics to Troubleshoot Bot Traffic.

Further, advertisers that already use DoubleVerify and IAS for fraud detection can upgrade their tools, so they can see more and do more with FouAnalytics -- 6 of 6 advertisers area already opting to do so. Getting excel spreadsheets with [tail aggregate] and low fraud numbers is fun and all, but it's not actionable because you can't see which sites and apps were fraudulently eating up your ad impressions and budgets. With FouAnalytics in-ad tags monitoring your ad impressions, you get a summary of the most fraudulent sites and apps, sorted by largest volume first. These are the sites and apps that have the largest negative impact on your campaigns, so you should review them first and decide whether to add them to your blocklist. If you agree with our labeling of various sites and apps, you check the checkbox and a list is compiled for you (like the screen shot below). You copy and paste these sites and apps into your block list, or send to your agency to do so. Like I said, this was created for my own workflow, in auditing and optimizing digital campaigns for clients. Now you can use it too. For more details, please see How to Use the Domain App Report in FouAnalytics to Review Sites and Apps.

Hopefully this shows you that having bigger data (e.g. log-level data) is not always better. But having the right analytics, which shows you 1) why something is marked a a bot, and 2) which sites and apps are fraudulent and problematic, enables you to understand and act upon the data.

As always, if you have any questions, email or DM me. If you think the above is useful and others may benefit from reading it, feel free to share out and post.

Happy (fraud) hunting and optimizing (digital) campaigns.

FouAnalytics Practitioners

18,746 位关注者

Dominic T.

Senior Data Science & AI/Marketing Professional

2 年

Great article...alternate title: GIGO lives!

1 次回应

要查看或添加评论，请登录

Dr. Augustine Fou的更多文章

Where did my ads run? In-Situ Screenshots by FouAnalytics

2025年3月12日

Where did my ads run? In-Situ Screenshots by FouAnalytics

I've said for years that the programmatic bidding process for every single ad opportunity was too much "work" --…

1 条评论
Some exchanges transact 100% "unknown, unpermissioned" inventory

2025年3月8日

Some exchanges transact 100% "unknown, unpermissioned" inventory

The data in the table below is from the bid firehose, global, 30 days. It gives a macro view of what is happening in…

2 条评论
Overpaying, incrementality, cause and effect, how to tell with FouAnalytics

2025年3月3日

Overpaying, incrementality, cause and effect, how to tell with FouAnalytics

Today's post is nothing new. But I am bringing together a few concepts from various articles in one place for clarity.

1 条评论
Analytics Are Better Than Fraud Detection, Here’s Why, With Examples

2025年3月1日

Analytics Are Better Than Fraud Detection, Here’s Why, With Examples

First Published Jan 19, 2021 - https://www.forbes.
Public information on DidntVerify

2025年2月28日

Public information on DidntVerify

Earnings Call --- Negative Points: - DV faced a significant reduction in ad spend from one of its largest customers due…

6 条评论
Use FouAnalytics to run incrementality tests without running incrementality tests, how?

2025年2月25日

Use FouAnalytics to run incrementality tests without running incrementality tests, how?

Don't know how to run incrementality tests? Right, you and me both. The reason it's hard to run accurate incrementality…

6 条评论
Attribution Fraud from Simple to Advanced, How to Detect with FouAnalytics

2025年2月15日

Attribution Fraud from Simple to Advanced, How to Detect with FouAnalytics

If the url contains utm_source=msn, did the click actually come from a user who saw an ad on MSN .com and clicked on it…

7 条评论
What OMSDK detection looks like in FouAnalytics

2025年2月13日

What OMSDK detection looks like in FouAnalytics

We have seen legitimate companies deploy the OMSDK for measurement. A list of such companies is here -- IAB list of…

2 条评论
Yeah, I ran media buying experiments and paid for them -- data and evidence linked below

2025年2月12日

Yeah, I ran media buying experiments and paid for them -- data and evidence linked below

Many other experiments not written up over the years -- Facebook campaigns, Twitter campaigns, etc. Follow for more --
Repeated failures of “Black Box” Verification Vendors - false positive, false negative

2025年2月12日

Repeated failures of “Black Box” Verification Vendors - false positive, false negative

Excerpted from https://www.forbes.

See all articles

Big data could simply be bad data

Big data is not accurate or actionable

Right analytics leads to right actions

FouAnalytics Practitioners

18,746 位关注者

Dr. Augustine Fou的更多文章

Where did my ads run? In-Situ Screenshots by FouAnalytics

Some exchanges transact 100% "unknown, unpermissioned" inventory

Overpaying, incrementality, cause and effect, how to tell with FouAnalytics

Analytics Are Better Than Fraud Detection, Here’s Why, With Examples

Public information on DidntVerify

Use FouAnalytics to run incrementality tests without running incrementality tests, how?

Attribution Fraud from Simple to Advanced, How to Detect with FouAnalytics

What OMSDK detection looks like in FouAnalytics

Yeah, I ran media buying experiments and paid for them -- data and evidence linked below

Repeated failures of “Black Box” Verification Vendors - false positive, false negative

社区洞察