The Limits of Fraud Detection
Photo by Edwin Chen on Unsplash

The Limits of Fraud Detection

A post and a question on the same day prompted this article. “Fraud Technology is greater than Anti-fraud Technology. That’ll probably never change. All we can do is try to keep up, faster.” And “there are dozens of companies selling click fraud detection. All of them are blackbox (they don’t tell you how they measured it); how can you tell which ones work or not, and why?” Let me address this below.?


IP address-based detection is useless

Many fraud detection services are built on IP address reputation scores. That is, if an IP address is a known fraudster it has a low reputation and can be blocked. Much of this line of thinking comes from protection against email spam, because spammers send out bursts of emails from the same IP address/mail server. Conversely, if an IP address has a high reputation or has never been seen before, these services let it through. The problem with using this approach for detecting ad fraud is that it completely doesn’t work. Here’s why.?

Fraudsters using bots in data centers disguise their traffic to appear to come from residential IP addresses by paying for “residential proxy services.” The bots’ IP addresses are not data center IP addresses, because that would be too obvious and too easy to block. It is sad that the IAB and TAG are so illiterate about ad fraud that they continue to sell to their own members a list of 100 million IP addresses to block, thinking that that helps to reduce ad fraud. Not only is that a large amount of computation to check 100 million IP addresses in milliseconds for each ad impression, it is also utterly useless. Any bad guy worth his salt is not using any of those 100 million IP addresses any more. In years past when a fraud detection company published a list of 600 thousand IP addresses used by the botnet they caught, within hours those IP addresses were no longer in use. So blocking them would be silly, futile, and silly futile.?

On the flip side, legitimate residential IP addresses don’t mean there is no ad fraud either. Perhaps one of the family’s devices got compromised with malware; or a rogue app on the device was committing ad fraud by loading thousands of ad impressions in the background, even when the device is not in use overnight. The malware-ridden device committing ad fraud will appear to be coming from that residential IP address; and fraud detection shouldn’t block it because the rest of the devices in that household are probably legit. And finally, with the increased use of VPNs, more and more humans appear to be accessing the internet through data centers, because their traffic is routed through the virtual private network. If the IP address block list blocks those data center IP addresses, they would be blocking legitimate human visitors. This is a regular occurrence for visitors accessing websites from universities, military bases, and corporate offices -- the traffic is routed through cybersecurity services so it appears to be coming from data centers, even though the end user is clearly a human.?

So IP address based detection via reputation scoring and block lists is utterly useless. If a fraud vendor tries to sell you that, you know they don’t know what they are doing; so you’d be wasting your money paying for their services. This goes for CDNs that claim to offer click fraud detection too. They operate at the network level and their detection is based on IP addresses. From experience we can see that despite all the fraud and bot detection used by sites like Ticketmaster, tickets for literally every popular event are bought out within minutes by bots, so scalpers can resell them on secondary markets for profit.?


Javascript tag-based detection has limits too

FouAnalytics, Google Analytics, and other fraud detection companies use javascript tags to collect data about the browser. Javascript is code that executes in the browser and asks the browser to report many parameters, including things like screen resolution, browser name and version, list of plugins, etc. These parameters are used to look for patterns that reveal the browser is a bot versus a browser being used by a real human to visit websites. Having javascript based data points is what one fraud detection company calls “validatable at the highest levels.” What they don’t say is that the other ad impressions and pageviews that are NOT validatable at the highest levels means the javascript didn’t run. The fraud detection was severely limited and usually consists of IP address and HTTP headers (which can be logged without executing javascript). As stated above fraud detection based on IP addresses and HTTP headers which are easily faked does not catch most of the fraud. It may catch some fraud when script kiddies mess up and make mistakes. But it certainly won’t catch hardened criminals with an ounce of experience committing ad fraud. (Note that in CTV and audio environments javascript cannot run, so these types of ads are not “validatable at the highest levels.” So buyer beware; and ideally don’t buy these unless you buy them directly from the media owner/publisher); otherwise fraud detection is practically useless.?

Having javascript tags is better than not having them. But practitioners should note that fraudsters and bot makers are skilled at faking practically all javascript parameters correctly. So finding fraud with just browser parameters detected by javascript is also limited. We should always assume that fraudsters are successfully tricking our detection, so we remain on the lookout for any telltale signs of anomalies. “Where there is smoke, that’s where to look for the fire” is the way to think about fraud detection. Having directly measured data with sufficient details enables practitioners to troubleshoot and find fraud that detection algorithms miss. This is for the simple fact that the algorithms were literally not made to look for certain things. They cannot detect what they don’t know to look for. That’s why good guys’ detection tech is always lagging behind fraudsters' fraud tech. The best we can do is to be always on the lookout and constantly improve our detections.?

What is a LOT harder for bots to fake correctly are timings -- how long it takes to complete tasks or computations. I won’t go into detail about how we do entropy analysis in FouAnalytics, but it is described in layman's terms here. How We Use Entropy Analysis for Fraud Detection.?


So What?

Now that you realize the limits of fraud detection, what can you do? If fraud detection tech is barely keeping up with fraud commission tech, what can advertisers do? For starters, you can buy from legit publishers who are not deliberately trying to rip you off. Long tail sites, and the ad tech intermediaries that facilitate their money making, can use all forms of fraud tech to trick the fraud detection tech and get by un-detected. Furthermore, if you use black box fraud detection tech, how do you know if they worked or not? If they constantly report low IVT (invalid traffic) to you, does that mean bots are actually low? Or does that mean they couldn’t detect them?

The alternative is analytics, like FouAnalytics, so you have the details you need to troubleshoot digital marketing campaigns yourself. With the right data and sufficient details, you can find the fraud and reduce it by adding domains and mobile apps to your block lists or stop buying entirely from entirely fraudulent sources. Knowing the limits of fraud detection and the fact that bad guys will always try to trick your detection, empowers you to do more with respect to your own campaigns. This way you don’t have to pay for black box fraud detection tech using detection methods that are utterly useless. If you keep paying for those, then that’s on you.?

Further reading:?https://www.forbes.com/sites/augustinefou/2021/01/19/analytics-are-better-than-fraud-detection-heres-why-with-examples/

Khamonte J.

Cybersecurity & IT Leader | Technology Futurist

3 年

Jonathan Ellis , Jonathan F

回复
Justin Macorin

Building leading NLP and AI products.

3 年

"IP address-based detection is useless" very true! The ability to impersonate multiple residential addresses and switch/swap IPs at any time make them 100% unreliable for most identity use cases. We gotta dig deeper!

要查看或添加评论,请登录

Dr. Augustine Fou的更多文章

社区洞察

其他会员也浏览了