Why did ads.txt fail?

Why did ads.txt fail?

TL;DR (too long; didn't read) -- ads.txt failed because there are too many middlemen (more than one) in the transaction of each ad impression through the programmatic supply chain. Each middleman/seller puts their own sellerID in the chain so the money flows through them, on the way to the next party. If ANY one middleman failed to properly enforce ads.txt and let a spoofed bid request enter the ecosystem, even if every other party in the chain enforced ads.txt, they will still fail to identify the spoofing. If, and only if, the buyer has complete information of every hop in the transaction of an ad impression will they have even a prayer of identifying the "leak." Given the trillions of bid requests per week, and nearly infinite possible combinations of supply paths, enforcing ads.txt to this degree is not worth the compute power, storage, and bandwidth, especially because enforcing it properly means drastically cutting supply. And ad buyers don't seem to care about spoofing or want lower quantities to buy.


One bad apple ... spoils ads.txt for everyone

In 2017, when IAB launched ads.txt with much fanfare and self-back-patting, bad guys took advantage of it right out of the gate to make more money and better hide their fraud. No one believed that, because no one wanted to believe that the shiny new initiative from the adtech lobbying trade body could fail so completely right out the gate. I wrote "How Ads.txt is Being Exploited by "Baddies" for Fun and Profit " where early data showed that the first adopters of ads.txt were fake sites. They were the first to "comply" with Google's and Mediamath's threats that they "would not transact with anyone without ads.txt."


Ads.txt exploited by bad guys right out of the gate

What do you think the bad guys did? They added ads.txt files to all their sites within minutes, so when Google and Mediamath's crawlers checked for the presence of ads.txt files, they would find them and these fake sites could continue to monetize without skipping a heartbeat. No one mentioned, let along checked, the contents of the ads.txt files at the time. Early, manual checks revealed that the fake sites' fake ads.txt files simply copied and pasted others' entries. Entire ads.txt files were copied and used across networks of fraudulent sites -- "[we found] a group of sites that syndicate the same ads.txt file -- taking only the top domains that have more than 100 million potential impressions per month, we estimated that the fraud network could make up to $270 million per month, assuming just a $5 CPM."

When pressed further, many DSPs and exchanges finally added functionality to check ads.txt and enforce it. However, we found that in most cases it was not turned on by default (because that would result in too much volume being unsellable). In fact, in Xandr/AppNexus, an ad buyer has to specifically check a checkbox that says "enforce ads.txt." It is not checked by default so ads.txt is not checked and enforced unless you knew about that, wanted it, and checked the checkbox yourself. Do YOU know where to check that checkbox? Does your agency know where to check that check box. Does your DSP have the ability to check and enforce ads.txt? Do they do it for you and can they show you proof they did it? Don't trust the log level data they supply because you have no way to check the veracity of it.


Supply path leakage is a symptom of ads.txt failures

By 2021, I had run enough experiments to publicly document the following phenomenon. Even when I use the most simple of inclusion lists -- a list with just 1 domain in it -- I found that my ad made it to that domain only 73% of the time. 1 in 4 ads went somewhere else. This was when I did not specify which exchanges or supply paths to use. When I un-checked all of the exchanges and left only 1 exchange in the media buy, to try to target that 1 domain in the inclusion list, the accuracy of getting my ad to that domain went up to 97%. A 3% miss is fine, and better than most things in adtech from what I have seen over the last 25 years. Going from 73% accuracy to 97% accuracy is great; and that came from eliminating the supply path leakage when too many exchanges are included (not excluded) in a programmatic media buy.

No alt text provided for this image

This observable "symptom" of supply path leakage was corroborated by the 1st and 2nd ISBA studies, which used log level data to try to match ad impressions from end to end, through the supply chains. Note that the number of matchable impressions got cut in half from 267 million in the first study to 104 million in the second study. The first study revealed that only 12% (1 in 10) ads could be matched through the supply chain. And they found the now infamous "15% unknown delta" where they could not figure out where the money and ads went. Even though the match rate went up to 58% in the second study, the denominator went way down so it exaggerated the effect.

No alt text provided for this image


One bad apple spoils the bushel

As we know from kindergarten, "One bad apple spoils the bushel." The same applies to ads.txt and adtech. Let me explain. When reviewing FouAnalytics in-ad data for a client, we see breitbart.com still showing up (FouAnalytics tag detected where the ads went, this is not taken from log files or placement reports supplied by platforms). Initially, we were scratching our heads. Breitbart dot com had been on every advertisers' block list since 2016. How could it still be monetizing and getting through? The client had even taken their buying a step further, to use an inclusion list. How would breitbart get though in that case? Breitbart or any fraudulent site would not put their own domain in the bid request because they would be easily caught and blocked. So they always put someone else's domain in the bid request. For example, if they put one of the domains that IS in your inclusion list they get through. The fraudsters are simply daring the good guys to catch them -- because good guys have to do the work to check and enforce ads.txt. So we checked with DSP and asked them to show us proof that they are checking and enforcing ads.txt for us. To our surprise (and delight) they did prove that they were checking and enforcing ad.txt for us. The mystery deepens. How, then, is breitbart still getting through?

I will spare you the gory details of the sherlock-holmesing that ensued. Turns out, "one bad apple spoils the bushel." In this case, the bad apple was one exchange that failed to enforce ads.txt, or deliberately looked the other way. This enabled domain spoofing a point of entry into the entire ecosystem. When breitbart.com lied and put readersdigest.com in the domain field of the bid request, the sellerID was still the breitbart sellerID, so they could make the money. The exchange should have noticed that the sellerID (breitbart's) did not match the domain (readersdigest.com) declared in the bid request -- and should have rejected it (if they properly checked and enforced ads.txt). By failing to do so at the point of entry, that bid request gets passed along to a dozen other exchanges. Those other exchanges won't have the ability to look back and check that the domain they see (readersdigest.com) does not match the originating sellerID. This is further complicated/obfuscated by resellers that put in their own sellerIDs because they want the money to first flow to them (so they can take their cut before passing the net to the originating seller). Multiply this complexity by the dozens of exchanges, hundreds of supply paths, bundling and reselling, etc. and we get to the following realization.

Our "ah ha" moment was when we realized that if a single exchange didn't do their job and enforce ads.txt properly, they've let the bad bid into the ecosystem. And after hundreds of "inadvertent laundering" steps, the buyer on the other side will assume that the bid came from readersdigest.com, a legit site. Why wouldn't they bid on it, win, and serve the ad? That's how the ad ended up on breitbart, when the ad buyer thought they were bidding on readerdigest. Note that the log files will also report readersdigest.com in the placement reports, effectively hiding the fact that ads and ad dollars are still flowing to sites which the ad buyer did not want them to go. Again, this is why log level data is not good enough.


A few bad apples means ads.txt is useless in practice (even if it makes sense in theory)

Then I took a look at the ecosystem and using data from the bid request firehose, I plotted the percentage of bid requests that were "unpermissioned" or "ineligible" which means there was no ads.txt or no correct ads.txt entry. Note that most of the exchanges were clean -- at almost 0% "unpermissioned. But look at the 3 in red, with 39%, 82%, and nearly 100% unpermissioned bid requests. With just these 3 entry points, bad guys can get fake inventory into the ecosystem, that is effectively laundered and obfuscated by the complexity of the supply chains inherent in programmatic media buying. There is no way to effectively stop this with ads.txt, because beyond the first step/entry point, the rest of the exchanges touching the ad as it passes through the supply chains don't have the right information with which to enforce ads.txt even if they ARE checking and trying to enforce it.

No alt text provided for this image

The shape of this chart mirrors almost exactly what Confiant Inc sees when studying SSPs -- supply side platforms. A few (on the right) have the highest incidence of "security violations" (which pertains to malicious javascript code in ads -- i.e. malvertising). These exchanges are the entry points for bad guys to get bad ads, laced with malicious code into the ecosystem. Just a few bad apples -- exchanges that didn't enforce ads.txt properly -- means domain spoofing is not stopped AT ALL by ads.txt, even if every other exchange is properly enforcing it.

No alt text provided for this image

And finally, looking at the bid request firehose, we can get a sense of the ratio of UNpermissioned bid requests to valid ones, per exchange. I blurred the name of the exchange to save them the embarrassment. Some exchanges are clearly worse than others -- e.g. with way more UNpermissioned inventory than inventory that checks out with ads.txt. This reinforces the recommendation to turn off as many exchanges as possible in your media buy, to reduce leakage and "unknown delta." You don't need more than 1 exchange to buy your media. You certainly don't need 39 exchanges, which create unnecessary complexity in the supply paths, and therefore the leakage and waste.

And in some cases, you can see far more "unpermissioned" ad opportunities than real ad opportunities. Some of the most popular sites like yahoo are spoofed SO MUCH that we see more than 40X (40 times) more "unpermissioned" (ads.txt did not match) than real inventory (see slide below).


So what? What next?

Believe me or not, your question might be so what? or what do I go do? And perhaps you might wonder how prevalent this is -- is it a big problem or smal problem? We looked at one major sports publisher. There's 5X more "unpermissioned" inventory than real inventory from that publisher. Sadly this publisher doesn't know the bad guys are spoofing their domain and app to such a degree. The publisher doesn't even see it because NONE of this occurs on the real publisher's website or mobile app. Faked bid requests are sent into the ecosystem by bad guys and ad buyers' algorithms bid on bid requests that appear to have a famous, mainstream sports league domain in the bid request, not realizing that it didn't originate from them and they can't even see it happening. Buyers, if the CPMs you see for a mainstream domain or mobile app are too good to be true, they ARE faked bid requests/domain and app spoofing. Don't buy it!

If you NOW realize, like we do, that ads.txt and other verification initiatives are NOT ABLE to solve domain spoofing in programmatic media buying unless EVERY party in the supply chain enforces it properly, the solution is to avoid the problems entirely by buying from real publishers. How do you do that if you still want to use the platform and machinery you are used to using? Simple. Here's a series of steps you can take

  1. move to an inclusion list approach to buying, if you haven't already. This is usually domain based.
  2. better still is to use an inclusion list of sellerIDs. Keep in mind that sellers may be direct sellers or resellers of other parties' inventory (in which case you can't be sure the declared domain is real)
  3. even better is to ask a handful of good publishers which exchange they sell through and which seller ID/deal ID is their real one. And set up campaigns to use as few exchanges as possible, and target that short list of seller IDs from real publishers.

Happy media buying y'all... hopefully you will buy the REAL stuff and not the CHEAP stuff. Let me know what you think of the above, or if you think I'm smoking something.


Further reading: https://www.dhirubhai.net/in/augustinefou/recent-activity/newsletter/



Michael M. M.

Ad-Fraud Investigator & Media Expert, member of Digital Forensic Research Lab cohort "Digital Sherlocks" - Adding some fun when asking unexpected questions you were not prepared to hear

1 年

Less is more. Direct is better. Controlling is best. But after my talks at the summit in Berlin, many CMOs and Digital Directors don't care. But CFOs do. And Shareholder too.

Ads.txt was a compromise between both sides of the coin. It was never meant to be the permanent fix, similar to how most websites keep copying the same pixels on to each upgrade of the site even though they are not working with those companies anymore. Ads.txt only really worked if people actually monitored it properly. Anyway anything but the agreed upon sizes for banners, is just an educated guess this one from 5 years ago

Kalle Pihlajasaari

Product Development Engineer at DATA ABSTRACTION

1 年

I feel that should this or some similar system actually work it would result in the end of freedom. While I have no love for Breitbart I have less love for the mainstream media that is not out friend. I would love to see a system that lets the END USER decide what adverts they get. This should be the goal of every noble and ethical person and organisation. Sure you cannot push propaganda or promotion of worthless junk using brainwashing and manipulation, but would it not be a better thing if the world had the betterment of humanity as a goal instead of corporate bottom line as the only measure of performance. We have seen what happens when the pharmaceutical industry owns the regulators, the media, the politicians. They do so because no one in the ecosystem wants to place ethics over money. The problem is not Breitbart. The problem is Phizer. So create a better mousetrap or at least try to do so, let my NEED dicatate my adverts not the corporate GREED. Syndicated adverts that are 'perfect' will only serve the rich producers not the end user how is that a win?

Haley Austin

Director, Digital Media at BUNTIN

1 年

Long, but fascinating read.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了