In 2024 "Can a bot fake .... ?" -- FAQ
The typical flow of a conversation during the first call with a new customer having fraud problems can be boiled down to this:
Client: “What are bots capable of in 2024?”
Me: “Quite a lot! With a big laugh.”
Client: “Haha, but... realistically, what exactly?”.
Whenever I talk to new clients the conversation will have this sequence of questions. That’s why I made this FAQ from a technical point of view. What can bots fake, spoof, manipulate, alter, etc. in order to appear as a real genuine browser and a genuine human sitting behind the screen.
After the initial questions made by the marketers, the tech people jump in. They ask the more difficult questions which are very specific and based on research they did to solve their problem or research prior to the call. I have collected and combined the most asked question into a FAQ.
This article contains in-depth and detailed answers to 10 questions about bots, you’ve always wanted an answer to. It has been written for marketing professionals looking for ad-fraud and/or lead generation-fraud solutions where this FAQ can be used as a guideline in order to ask these vendors: "How does your solution detect a bot faking ... ?"-FAQ
1. Are bots able to fake the domain name of the website they are accessing?
The domain name can be read from the browser using JavaScript by reading the values window.location.origin, window.location.host, window.location.hostname and window.location.href properties [1]. Browser automation, ie. advanced bots, are capable to load webpages and execute JavaScript. These bots will remote control the browser in all its glory. This is achieved by starting the browser and controlling each tab using the CDP protocol (Chrome Dev Tools), giving instructions to click, scroll, type, etc. This works for Chromium based browsers, eg. Chrome, MS Edge, etc. but also Firefox and Safari have similar protocols, though less common.
The example in figure 1 is a post-bidding example. But, in pre-bidding the browser will send one or more bid-requests for an advertisement. This request is again fired from the same browser and contains plain text, which can be faked to anything the bot wants.
So, can bots fake the domain they are on? Yes, they can both pre-bid and post-bid.
2. Are bots able to fake the referrer URL?
The referrer URL informs 3rd party web sites which URL is visited when a resource is requested. For example, if you are browsing to www . usatoday . com and a JavaScript is loaded from a different domain, eg. static.adsafeprotected.com, the referrer header is set. Figure 2 shows the referrer header and its value. These values can be changed by bots by intercepting the network traffic within the browser, overriding the value to whatever they want, and send the modified traffic.
3. Are bots able to fake the Mobile App name?
Advertisements in Mobile Apps are displayed using a web browser that is embedded within the App. This embedded browser is called WebView. Within this webview an advertising platform loads and refreshing ads based on your cookies, geo-location, etc. For example, Google’s Admob is such a platform [2][3], InMobi, AppLovin, Glispa, Amobee are prominent alternative platforms.
Bots will request ads from these platforms trying to look and feel and smell as a genuine mobile App user.
When an App accesses a website using WebView the appname is conveyed in the HTTP header x-requested-with. Figure 3 shows the communication captured with a MITM (man in the middle) proxy between the FT App installed on a real Android phone and Google. The highlighted blue line shows the x-requested-with HTTP header. This is of course only a single request made from the App. Another interesting fact which can be seen in Figure 3 is the User Agent (UA). Apps have full control over the UA and in this case you’ll see the Appname and full version are included in the UA as well.
Will bots be able to fake requests like this and spoof all information in order to blend in with real app traffic made by humans? Yes, they can (pre-bid and post-bid).
4. Can bots fake utm_ parameters and other forms of link decoration?
Querystring parameters like utm_source, utm_campaign, gclid and/or fbclid are technically nothing more than a ampersand separated string appended to the GET request. When bots load the advertisement in the browser, they already know what the target link will be. They can simply change the parameters upfront, and click and let the browser do its work. Another method is to dynamically change the parameters using request interception. If a bot already uses this technique to change HTTP headers, referrer it is fairly easy to add some rules to rewrite destination URLs. So, yes they can. Easy!
5. Can bots fake cookies in the browser?
Browsing to a website means that all cookies in the browser associated with that URL are sent along with the request. Figure 5 shows how this looks at the website ft.com. These persistent cookies are stored in the Chrome profile in a file called Cookies in the Default folder, in Firefox it is a SQLLite file in the Firefox' profile folder. Once you know how cookies can be set and retrieved programmatically you are able to extract cookies, store these in a database, and retrieve the cookies on another device to start a “warm session”.
So, can bots fake cookies? Yes, they can and will warm browsing session in order to maximize their profit.
6. Can bots fake the fingerprint of the browser?
Using CDP (Chrome Devtools Protocol) any property or value in the browser can be overridden. This enables fraudsters to change values like: Screen resolution, keyboard language, available plugins, time zone, webGL vendor and renderer, etc. Combining these property values (and many others) are called the browser's fingerprint. This also means if a value changes, the fingerprint changes. CreepJS is an open source tool to calculate your browser's fingerprint. When developing a bot creepJS is typically your litmus test.
In addition to fingerprint a series of static values, it is also possible to fingerprint responses to challenges. For example, WebGL shapes are drawn and because different OSes, browsers, videocards and its drivers may use different anti-alias methods, have different IEEE754 floating point implementations, use different rounding modes, etc. the color values of individual pixels may thus differ per video card type [4][5].
This technique can be utilized to detect bots. Unfortunately, bots will counter this technique by adding random noise to the generated pixels causing the values to change slightly. By changing those values the fingerprint will completely change as it is a hash value of the input. The good part is bots adding random noise will become unique and thus outliers, the bad part is blocking outliers is a recipe for false positives [6]. So, yes, bots can change the both the static and the dynamic fingerprints, though the latter with a catch.
7. Can bots fake the TLS fingerprint of the browser?
TLS fingerprints are generated server side. It is based on the client-server handshake prior to the encryption of the communication. In this handshake the browser sends: Hello, I support these encryption cipher suites. The server answers with the selected cipher and key (simplified) [7]. Different browsers on different OSes support different cipher suites. This is most relevant for request based bots as by default their fingerprint does not resemble any browser [23]. That’s why request based bots will have to use special clients and tools. For example curl-cffi (see Figure 7), curl-impersonate, AzureTLS and CycleTLS [8][9][10].
Browser based bots need to connect to a proxy in order to change the fingerprint. In such a setup a proxy server will setup the secure connection to the web server with the publisher site and/or landing page. The proxy will forward the requests on behalf of the automated browser. In this case the web site (and its fraud detection) will fingerprint the proxy server's requests.
The first TLS fingerprinting method was FingerPrinTLS and JA3 (Salesforce) was inspired by that. It is based on the order of the available ciphers in the client request when setting up a secure connection [24]. In order to prevent this type of fingerprinting Chromium and all derived browsers (Chrome, MS Edge, Samsung Browser, Android Chrome, Opera, etc) started to return the available ciphers in a randomized order. This started with Chromium version 110 [25]. This means that JA3 is ineffective in new browsers and is not able to identify Chromium based browsers. Ofcourse, other types of TLS fingerprints have emerged and are able to fingerprint and identify the used OS and browser(version).
Can bots fake the TLS fingerprint? Yes, they can. Request based bots use specific tools to impersonate, where browser based bots need to proxy their requests.
8. Can bots fake (prevent) WebRTC from leaking your real IP address?
WebRTC (Web Real-Time Communication) is the technology that enables videoconferencing from a browser [11]. WebRTC uses point to point communication bypassing proxy servers configured in the browser. In fraud detection WebRTC can be used to detect the real internet facing IP address of the client, even if the client is using a proxy or VPN. The detection can be split in two parts: Capturing the IP address server side and extracting the local IP address(es) at the client using JavaScript.
WebRTC Server side
In order to detect bots and fraudsters using residential proxies anti-bot detection companies have setup their own WebRTC infrastructure. Cheap and low quality VPN clients allow anti-bot and fraud detection companies to extract the true external IP address the bot or fraudster uses. Premium quality (non-free) VPN and proxy software typically don’t have this issue.
Figure 8 shows a screenshot of the browserleaks’ WebRTC test page [13]. Both the IPv4 and IPv6 addresses in the screenshot are located in New York, United States. I made this screenshot in the Netherlands, so my local IP address did not leak while I was using ProtonVPN. Other VPN clients or residential proxy services may have different results.
WebRTC Client Side
In 2015 Daniel Roesler exposed a WebRTC vulnerability on his github page [12], see also Figure 9. This vulnerability enables code running at the client to know its external IP address, even if the client is part of a local infrastructure with local addresses, eg. a corporate network or your home network, behind a firewall.
Depending on your network configuration the JavaScript code on the github will be able to extract your local IP addresses of your device. In case of IPv4 the code will typically extract your internal network IP address, which is in most cases is a NAT (Network Address Translation) address, eg. 192.168.x.x or 10.x.x.x. However, with IPv6 the need for NAT disappeared. That means in many cases this technique reveals your true IPv6 address, and thus your true location, even though you are connected through a VPN, and behind a router/ firewall.
Professional fraudsters and bot developers know this and thus use in most cases good quality residential proxies or in some cases VPNs. At the client side they will override the specific WebRTC APIs and spoof the returned values to whatever IP address the proxy server or other end of the VPN tunnel has. That way all IP addresses read (client and server) will match and the total session will look genuine. They also will rotate IP addresses quickly to stay below the ‘IP address rate limiting’-radar.
领英推荐
9. Can bots solve CAPTCHAs?
With the rise of AI you would expect that bots will be able to solve all CAPTCHAs automatically. That is correct up to a certain degree [14][15]. Images containing text, or simple image recognition is achievable with high accuracy. Figure 10 shows examples of CAPCHAs which can be solved automatically.
More recent CAPTCHAs have become puzzles based on knowledge, where you need to have some subject knowledge in order to solve the CAPTCHA. Sometimes they even resemble an IQ test. Figure 11 contains a few example CAPTCHAs that require general knowledge, eg. animals that lay eggs, the usage of objects eg. vehicles on paved roads, and/or the monetary value of goods.
This type of CAPTCHA along with the sliding CAPTCHA type in Figure 12 are not that easy to solve automatically. The knowledge questions are not that hard to program, but they rotate quickly. Both questions and the individual images shown in the CAPTCHA constantly change requiring general knowledge. Also the individual images are purposely dithered and/or made of poor quality and/or low contrast, see the paved road example in Figure 11 . As it is economically feasilble to rent CAPTCHA solving services it doesn’t stop bots at all. These solving services will only slightly increase the price of the bot operation, but if it still works out economically; it still works out and continues.
Over time AI will become cheaper, better public models will become available enabling better text interpretation and image recognition. It is just a matter of time that bots are able to solve the more complex puzzles.
10. Can bots fake human interactions like mouse movements, clicks, scrolls and/or touches ?
When a browser is controlled by browser automation software it is able to move the mouse to new locations. In CDP (Chrome Devtools Protocol) mouse movements, clicks and scrolls are controlled by dispatchMouseEvent [16]. This enables a developper to fully control the mouse and its buttons and wheels. The same accounts for touch events by using dispatchTouchEvent, which can be used to emulate mobile behavor [17].
Sending mouse or touch events over time to a remote controlled browser is the simplest form of simulating behavior. Such programmatic series of events will not (yet) resemble human behavior, because humans don’t move their mouse in straight lines and humans don’t click on the exact same pixel multiple times.
In order to simulate human behavior humanlike mouse paths must be generated instead of straight lines. This can be achieved using b-splines [18] or bezier curves. The software is able to generates a series of X,Y points based on a starting and an end-point based on the coordinates of elements in a page. The second step is to calculates a spline curve and timestamps how fast the mouse should move from the starting coordinate to the destination coordinate and at what time resolution. This technique enables fraudsters to perform humanlike mouse movements. This is exactly what mouse synthesizer [19] (see Figure 13) and ghost cursor do [20]. But, don't worry, of course this can be detected as no human is able to make perfect round curves using a mouse.
So, can bots fake interactions? Yes, they can!
Conclusion
You might think why didn’t you write something about blacklisting IP addresses? If you’re able to create a sophisticated bot able to buy #taylorswift tickets then you’ll KNOW how to spoof, fake and/or emulate browser functionalit. You are well aware that you have to use residential proxies, which means IP blacklisting will cause false positives. Filtering on IP addresses only works to exclude traffic from outside your country, but don’t forget if you’re in the US to include overseas territories like: Guam, American Samoa, Virgin Islands, etc.
You are aware that bots are able to fake (almost) anything. They can’t fake dynamic WebGL / GPU challenges but they will poison these challenges with noise in order to hide their headless appearance. CAPTCHAs do work to a certain degree, but they do annoy humans and thus cause friction to their journey.
So, can these sophisticated bots be detected? Sure they can. Once you know how bots override properties, fake answers, and hide their true appearance you know where and what to look for. Lastly, browser automation will cause “browser automation”-leakage. Spotting these leakages and traces of automation will reveal the true nature of the bot accessing your website.
2024-07-11
Questions? Corrections? Remarks? Need help with bots and fraud? Feel free to connect, comment or DM
#adfraud #bots #CMO #digitalmarketing #browserautomation #clickfraud
CEO & Co-Founder at ClearTrust | Score your traffic
8 个月Lovely post and details Sander Kouwenhoven .
Should have Played Quidditch for England
8 个月Great article Sander Kouwenhoven shared on X
FouAnalytics - "see Fou yourself" with better analytics
8 个月Yes! Bots can fake anything. Nice FAQ
In addition to the 10 questions in this FAQ I got this question in my DM: "Can a bot fake ... [CTV] video ad quartile completion events?" Video ad quartile completion events are pixels fired from the browser at 25%, 50%, 75% and completion of the video advertisement. These events are nothing more than loading a pixel with a payload describing which unique id,?advertisement, campaign, publisher, etc, etc in order to tie back the viewability and completion rate of video advertisements. As you can read in the article, bots can generate any type of HTTP(S) request and thus are able to generate video completion requests incl payload as well.?
Ad-Fraud Investigator & Media Expert, member of Digital Forensic Research Lab cohort "Digital Sherlocks" - Adding some fun when asking unexpected questions you were not prepared to hear
8 个月People who can use ChatGPT4o and are able to define a valid prompt, are also able to "build" a script who is able to do all listed tasks. Generating fake traffic has become very easy.