Ad Fraud Detection is Handicapped by Where the Measurement Takes Place

Ad Fraud Detection is Handicapped by Where the Measurement Takes Place

There are multiple techniques used in the fraud detection trade. One or more of these methods may be used at the same time by fraud detection technologies and platforms. But each such technique is limited or handicapped in its own way. Let me explain, as follows.

Big Data Probabilistic - At the highest level, this form of fraud detection is based on the data set of billions of bid requests that made every day in programmatic ad exchanges. In general terms, this big data technique looks for patterns in the data such as over-frequency (same user visiting sites far too frequently to be a human visitors) and over-consistency (millions of users visiting the same set of sites to be real human visitors) and calculates the probability that the visitors are not human.

In-Exchange - Fraud detection in-exchange at the RTB level (real time bids) have two main pieces of data - 1) the site, and 2) the user. If either of these is in a blacklist of known offenders then the ad is not called, ad impression is not served. But in the vast majority of the cases the sites and the users are not in the blacklist because bad guys take down unproductive domains all the time and put up new ones and also turn off unprofitable bots (ones that are caught) and create new ones. So this form of detection is severely limited in the fraud it can catch; and it is based on blacklists.

Network-Based - Fraud detection can also analyze network traffic and look for tell-tale signs of bots -- like origins coming from data centers (based on IP address) and like user agents which are in the list of known, named bots. This form of analysis looks at HTTP header information and also relies on known lists of data center IP addresses and bot names. As the recent Methbot research by WhiteOps has shown, both of these are obfuscated and forged by bots so they are not reliable for fraud detection.

In-Ad - This is the form of fraud and bot detection that is by far the most extensive. This is due to the fact that these ad tags ride along with the ad impressions that are served -- billions of them. However, because these detection tags are served in foreign iframes of the ads (i.e. from a different domain) they are limited in what they can detect on the page that caused it to load. For example, it cannot directly measure brand safety because it cannot read any content on the parent page; it also cannot measure viewability directly because it cannot detect where the ad slot itself is on the page. So while this is the most extensive form of bot detection, it is also the most limited.

On-Site - When there is the (rare) opportunity to load tracking javascript directly on the webpage, the most detailed form of fraud detection can take place. The measurement javascript loads when the page is loaded by the visitor and when the javascript runs it can collect hundreds of additional details about the user environment. These are the same details collected by Google Analytics, but they can be used to differentiate a real human user from a bot.

So the key take away from this post should be that you should ask your fraud detection vendor for the details of the data and also where the measurement takes place. It will give you a great sense of how accurate the fraud detection will be and therefore how much weight to put in the resulting data.

For more on ad fraud, please see:

About the Author:  “I advise advertisers, publishers, and agencies on the technical aspects of fighting digital ad fraud and improving the effectiveness of digital advertising. Using forensic technologies and techniques I help to assess the threat and recommend countermeasures to combat fraud and improve ROI.” 

Follow me here on LinkedIn (click) and on Twitter @acfou (click)

Further reading: https://www.slideshare.net/augustinefou/presentations


要查看或添加评论,请登录

Dr. Augustine Fou的更多文章

社区洞察

其他会员也浏览了