Current fraud detection can yield completely wrong results, here's why.

Current fraud detection can yield completely wrong results, here's why.

As I have written on many occasions before, fraud detection is limited based on where the measurement is done. There are many errors in measurement, and measurement is easily tricked. So recent reports that ad fraud is low and lower than before, and that mobile fraud is so low people shouldn't worry about it, is not because fraud is low. It's because they can't detect it.

There are three main places where the measurement takes place.

In-Network (trillions of bids, 50 ms to make a decision)

The largest volume goes through "in-network." When the bid request comes through, there are primarily 2 bits of information - the site (on which the ad is about to be run) and the user (e.g. a cookie or other identifier). If the fraud detection tech has seen the site before and it is fraudulent, it knows not to serve the ad. If the detection tech has seen the user before and knows it is a bot, it knows not to serve the ad. But in the vast majority of the cases, they would have seen NEITHER before. That is because bad guys are quick at taking down sites that no longer make money for them. And their bots are quick to dump the cookie and get a new one. So for most of the cases, the default action is to let the ad serve. This is why this form of fraud "filter" is practically no filter at all, since most everything is let through.


In-Ad (billions of ad impressions, cannot look outside iframe)

The second most common form of detection is in-ad, since these are sent in through the ad impressions as an ad tag. Most advertisers don't have the luxury of putting fraud detection code on the sites (bad guys' sites won't voluntarily install it). So the ad tags ride along with the ads when they are served. Ads are served into a foreign iframe. That means the iframe is from a different domain (e.g. DoubleClick) compared with the page (e.g. nytimes.com). Due to basic browser security, one iframe cannot talk with another iframe. That means trackers inside the ad iframe cannot look outside of itself to do things like measure viewability or brand safety.

So viewability and brand safety have to be approximated and extrapolated using other methods. And those methods are easily and regularly defeated by the bad guys. By simply passing a fake variable about where the ad ran, the bad guys can cover their tracks (i.e. hide the fake site, and pretend to be a good site that is "brand safe."). So advertisers that rely on this form of fraud detection and viewability measurement need to understand that what is marked as NHT or IVT can be completely wrong, and what is marked as viewable can also be 100% wrong. As in the following example.

"brand safety vendors may be outright lying to customers"

To be specific, brand safety vendors may be outright lying to customers. Here's how. They purport to be doing brand safety measurements. Most clients assume that means their in-ad tracking tags are looking at the webpages on which the ad was loaded to see if there are adult keywords or other not-brand-safe content on the page. But this is not the case. Remember, the tracking tag in the foreign iframe cannot look at the parent page; so the tracking tag cannot read anything on the page to do brand safety measurements.

The way brand safety measurement is actually done is by crawling a large number of websites (this large volume activity is done by bots, created by the brand safety vendors, mostly undeclared -- which means they don't say their name honestly). Once they crawl the sites and sort them into brand safe ones and ones that are not brand safe, they check a list of sites where their customers tell them their ads ran (otherwise known as placement reports). But of course, you realize the flaw in this right? Bad guys' fake sites (fakesite123.com) will not show up in those lists. The bad guys cover their tracks and pass a fake source (e.g. espn.com); and that is what appears in the placement reports. Of course ESPN.com is a brand safe site. So the brand safety vendor reassures their customers their ads ran in safe environments, when their ads actually did not, and the vendor literally has no actual information about whether the ads were actually brand safe; but yet they keep taking their clients' money. Not cool, guys.


On-Site Measurement (millions of pageviews, most detailed and complete)

In this case, there is code installed on the page. So measurements like viewability are the most accurate. The code on the page can "see" where the ad slots are and determine where they are with respect to the viewport - the portion of the browser window that the human can actually see -- in desktop and mobile web. (This does not work in-app). And the NHT/IVT measurements are also more detailed and complete. And this way, there are accurate detection of bots and less to no false positives. In the case of the chart above, Sources 1 and 2 were installed side by side on the page, while Source 3 came in through the ad iframe. I have dozens more examples of this and on-page measurement is corroborated and in-ad measurement of the exact same visit is completely off.

Unfortunately some parties (like a media buying agency) insist on using only this one source for measuring NHT/IVT even though it has been brought to their attention numerous times that the measurements are entirely wrong. They refuse to allow another measurement company to verify the results. And they are constantly in the press pushing blame on Google and Facebook for ad fraud, specifically noting each time that they didn't historically allow third party measurement (they do now). Further note that this media buying agency is part of the same holding company that owns a substantial stake in the measurement company in question. Hmmm. Do you think there may be something else going on -- e.g. the buyer chooses the tools that find high IVT in order to extort refunds from good publishers (either to appear to be doing something for their clients, or to keep the make-goods for themselves, not sure which; but both are bad).

Here is a specific example of how tag placement matters and different placements lead to entirely different outcomes. One shows high bots and the other shows high humans.

Finally, it is critical to measure for humans as well as for bots. This is because saying "10% bots" does not mean the other 90% is humans. See the chart below. There is a bunch of stuff that is simply not measurable (white), or there is not enough data to label it either way (gray). So even if there are only 24% bots (dark red) there may only be 24% humans (dark blue). And I've seen many cases where it is far, far less than that.

Fraud detection companies need to disclose what portion of the data is not measurable. And they should start measuring for humans, positively. And they should provide details of how they arrived at their measurements. Then it will be apparent what assumptions went in, whether the results were extrapolated from small samples, etc. In the case of the unmeasurable data, do you assume that it is bots? or do you assume it is not? Well, that depends on who your main client base is.

It is widely known in the industry that some vendors favor the publishers while others favor the advertisers.

It is widely known in the industry, but seldom discussed, that some vendors favor the publishers while others favor the advertisers. So when it comes to NHT, some vendors can conveniently assume what is not known to be not NHT to help their clients the publishers reduce the refunds they have to give; other vendors that favor the advertisers/buyers can conveniently assume some of what is not measurable to be bots so they can report higher NHT and help their clients get bigger refunds. That's why no one's numbers come out the same. And no one provides any details that can be verified. How convenient.

Bad guys have better tech. And they are easily tricking measurement systems. See the following animated GIF of a tool they use to rotate referring domains, user agents, and IP addresses, so their bot traffic appears to be coming from legitimate domains, legitimate browsers, and residential IP addresses. This is how Methbot and other botnets remain hidden for years, generating trillions of fake impressions, and earning billions of ad dollars.

Source: Ratko Vidakovic, Marketing Land, May 22, 2017

Fraud detection vendors and agency holding companies telling clients everything is fine and not acknowledging the technical limitations of their platforms, continue to do disservice to the entire industry and causes direct and material harm to advertisers and publishers and the entire digital advertising ecosystem.

"There will be hell to pay. Soon."


About the Author:  “I advise advertisers and publishers on the technical aspects of fighting digital ad fraud and improving the effectiveness and transparency of digital advertising. Using forensic technologies and techniques I help to assess the threat and recommend countermeasures to combat fraud and improve real business outcomes.” 

Follow me here on LinkedIn (click) and on Twitter @acfou (click)

Further reading: https://www.slideshare.net/augustinefou/presentations

Christopher Blok

Director of Partnerships @ Coles 360 - Retail | Data | Marketing | Technology

7 年

Great article.

回复

Blake Moseley I sent this in an email a few weeks ago! But you're still awesome.. haven't had a meeting with you in awhile!

Blake Moseley

Global Head of Product at Hogarth (WPP) | Product Management, Personalisation, DCO, Addressability, AI Automation, Content Production

7 年
回复
Ashley Ashford

Digital Advertising Measurement Expert

7 年

No MRC accredited verification vendor will assume non-measured traffic is IVT regardless of their clients or positioning. That simply is not accreditable.

Ashley Ashford

Digital Advertising Measurement Expert

7 年

Methbot didn't go undetected for years since it only came onto the scene in late 2015 ... https://www.whois.com/whois/mediamethane.com ... and was being detected/filtered by multiple verification outfits soon thereafter to varying degrees. While being in a cross domain iFrame may limit access to page level information, which does render certain IVT detection methodologies unmeasurable, it does not render all IVT detection methodologies unmeasurable.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了