登录查看更多内容

In 2024 "Can a bot fake .... ?" -- FAQ

Sander Kouwenhoven

发布日期: 2024年7月11日

The typical flow of a conversation during the first call with a new customer having fraud problems can be boiled down to this:

Client: “What are bots capable of in 2024?”

Me: “Quite a lot! With a big laugh.”

Client: “Haha, but... realistically, what exactly?”.

Whenever I talk to new clients the conversation will have this sequence of questions. That’s why I made this FAQ from a technical point of view. What can bots fake, spoof, manipulate, alter, etc. in order to appear as a real genuine browser and a genuine human sitting behind the screen.

After the initial questions made by the marketers, the tech people jump in. They ask the more difficult questions which are very specific and based on research they did to solve their problem or research prior to the call. I have collected and combined the most asked question into a FAQ.

This article contains in-depth and detailed answers to 10 questions about bots, you’ve always wanted an answer to. It has been written for marketing professionals looking for ad-fraud and/or lead generation-fraud solutions where this FAQ can be used as a guideline in order to ask these vendors: "How does your solution detect a bot faking ... ?"-FAQ

1. Are bots able to fake the domain name of the website they are accessing?

The domain name can be read from the browser using JavaScript by reading the values window.location.origin, window.location.host, window.location.hostname and window.location.href properties [1]. Browser automation, ie. advanced bots, are capable to load webpages and execute JavaScript. These bots will remote control the browser in all its glory. This is achieved by starting the browser and controlling each tab using the CDP protocol (Chrome Dev Tools), giving instructions to click, scroll, type, etc. This works for Chromium based browsers, eg. Chrome, MS Edge, etc. but also Firefox and Safari have similar protocols, though less common.

The example in figure 1 is a post-bidding example. But, in pre-bidding the browser will send one or more bid-requests for an advertisement. This request is again fired from the same browser and contains plain text, which can be faked to anything the bot wants.

Financial Times website with Chrome Developer Tools opened. The window.location object in the browser contains the URLs which currently are shown in the tab — Figure 1. The window.location object in the browser contains the URLs which currently are shown in the tab

So, can bots fake the domain they are on? Yes, they can both pre-bid and post-bid.

2. Are bots able to fake the referrer URL?

The referrer URL informs 3rd party web sites which URL is visited when a resource is requested. For example, if you are browsing to www . usatoday . com and a JavaScript is loaded from a different domain, eg. static.adsafeprotected.com, the referrer header is set. Figure 2 shows the referrer header and its value. These values can be changed by bots by intercepting the network traffic within the browser, overriding the value to whatever they want, and send the modified traffic.

The browser requesting the JavaScript https://static.adsafeprotected.com/iasPET.1.js from https://www.usatoday.com will set the Referer HTTP header to https://www.usatoday.com/ — Figure 2. The browser requesting the JavaScript

3. Are bots able to fake the Mobile App name?

Advertisements in Mobile Apps are displayed using a web browser that is embedded within the App. This embedded browser is called WebView. Within this webview an advertising platform loads and refreshing ads based on your cookies, geo-location, etc. For example, Google’s Admob is such a platform [2][3], InMobi, AppLovin, Glispa, Amobee are prominent alternative platforms.

Bots will request ads from these platforms trying to look and feel and smell as a genuine mobile App user.

Figure 3. Screenshot of Charles proxy of the communication between the FT App installed on a real Android device and Google in order to display an Ad embedded within the App

When an App accesses a website using WebView the appname is conveyed in the HTTP header x-requested-with. Figure 3 shows the communication captured with a MITM (man in the middle) proxy between the FT App installed on a real Android phone and Google. The highlighted blue line shows the x-requested-with HTTP header. This is of course only a single request made from the App. Another interesting fact which can be seen in Figure 3 is the User Agent (UA). Apps have full control over the UA and in this case you’ll see the Appname and full version are included in the UA as well.

Will bots be able to fake requests like this and spoof all information in order to blend in with real app traffic made by humans? Yes, they can (pre-bid and post-bid).

4. Can bots fake utm_ parameters and other forms of link decoration?

Querystring parameters like utm_source, utm_campaign, gclid and/or fbclid are technically nothing more than a ampersand separated string appended to the GET request. When bots load the advertisement in the browser, they already know what the target link will be. They can simply change the parameters upfront, and click and let the browser do its work. Another method is to dynamically change the parameters using request interception. If a bot already uses this technique to change HTTP headers, referrer it is fairly easy to add some rules to rewrite destination URLs. So, yes they can. Easy!

Figure 4. The querystring part of a request is given in blue after the question mark ‘?’ in the request URL

5. Can bots fake cookies in the browser?

Browsing to a website means that all cookies in the browser associated with that URL are sent along with the request. Figure 5 shows how this looks at the website ft.com. These persistent cookies are stored in the Chrome profile in a file called Cookies in the Default folder, in Firefox it is a SQLLite file in the Firefox' profile folder. Once you know how cookies can be set and retrieved programmatically you are able to extract cookies, store these in a database, and retrieve the cookies on another device to start a “warm session”.

Figure 5. The cookies sent along with the request when browsing to https://ft.com

So, can bots fake cookies? Yes, they can and will warm browsing session in order to maximize their profit.

6. Can bots fake the fingerprint of the browser?

Using CDP (Chrome Devtools Protocol) any property or value in the browser can be overridden. This enables fraudsters to change values like: Screen resolution, keyboard language, available plugins, time zone, webGL vendor and renderer, etc. Combining these property values (and many others) are called the browser's fingerprint. This also means if a value changes, the fingerprint changes. CreepJS is an open source tool to calculate your browser's fingerprint. When developing a bot creepJS is typically your litmus test.

Figure 6. CreepJS is one of the most extensive open source tool to calculate your browser fingerprint

In addition to fingerprint a series of static values, it is also possible to fingerprint responses to challenges. For example, WebGL shapes are drawn and because different OSes, browsers, videocards and its drivers may use different anti-alias methods, have different IEEE754 floating point implementations, use different rounding modes, etc. the color values of individual pixels may thus differ per video card type [4][5].

This technique can be utilized to detect bots. Unfortunately, bots will counter this technique by adding random noise to the generated pixels causing the values to change slightly. By changing those values the fingerprint will completely change as it is a hash value of the input. The good part is bots adding random noise will become unique and thus outliers, the bad part is blocking outliers is a recipe for false positives [6]. So, yes, bots can change the both the static and the dynamic fingerprints, though the latter with a catch.

7. Can bots fake the TLS fingerprint of the browser?

TLS fingerprints are generated server side. It is based on the client-server handshake prior to the encryption of the communication. In this handshake the browser sends: Hello, I support these encryption cipher suites. The server answers with the selected cipher and key (simplified) [7]. Different browsers on different OSes support different cipher suites. This is most relevant for request based bots as by default their fingerprint does not resemble any browser [23]. That’s why request based bots will have to use special clients and tools. For example curl-cffi (see Figure 7), curl-impersonate, AzureTLS and CycleTLS [8][9][10].

Figure 7. Curl-cffi enables you to send requests to web servers from Python code emulating the TLS handshake of common web browsers

Browser based bots need to connect to a proxy in order to change the fingerprint. In such a setup a proxy server will setup the secure connection to the web server with the publisher site and/or landing page. The proxy will forward the requests on behalf of the automated browser. In this case the web site (and its fraud detection) will fingerprint the proxy server's requests.

The first TLS fingerprinting method was FingerPrinTLS and JA3 (Salesforce) was inspired by that. It is based on the order of the available ciphers in the client request when setting up a secure connection [24]. In order to prevent this type of fingerprinting Chromium and all derived browsers (Chrome, MS Edge, Samsung Browser, Android Chrome, Opera, etc) started to return the available ciphers in a randomized order. This started with Chromium version 110 [25]. This means that JA3 is ineffective in new browsers and is not able to identify Chromium based browsers. Ofcourse, other types of TLS fingerprints have emerged and are able to fingerprint and identify the used OS and browser(version).

Can bots fake the TLS fingerprint? Yes, they can. Request based bots use specific tools to impersonate, where browser based bots need to proxy their requests.

8. Can bots fake (prevent) WebRTC from leaking your real IP address?

WebRTC (Web Real-Time Communication) is the technology that enables videoconferencing from a browser [11]. WebRTC uses point to point communication bypassing proxy servers configured in the browser. In fraud detection WebRTC can be used to detect the real internet facing IP address of the client, even if the client is using a proxy or VPN. The detection can be split in two parts: Capturing the IP address server side and extracting the local IP address(es) at the client using JavaScript.

WebRTC Server side

In order to detect bots and fraudsters using residential proxies anti-bot detection companies have setup their own WebRTC infrastructure. Cheap and low quality VPN clients allow anti-bot and fraud detection companies to extract the true external IP address the bot or fraudster uses. Premium quality (non-free) VPN and proxy software typically don’t have this issue.

Browserleaks.com screenshot made while using a VPN client (ProtonVPN). The IP addresses shown are the addresses of the VPN endpoint — Figure 8.

Figure 8 shows a screenshot of the browserleaks’ WebRTC test page [13]. Both the IPv4 and IPv6 addresses in the screenshot are located in New York, United States. I made this screenshot in the Netherlands, so my local IP address did not leak while I was using ProtonVPN. Other VPN clients or residential proxy services may have different results.

WebRTC Client Side

In 2015 Daniel Roesler exposed a WebRTC vulnerability on his github page [12], see also Figure 9. This vulnerability enables code running at the client to know its external IP address, even if the client is part of a local infrastructure with local addresses, eg. a corporate network or your home network, behind a firewall.

Figure 9. Screenshot from Daniel Roesler's github that explains what Javascript code can do to determine your local (ISP facing) IP address

Depending on your network configuration the JavaScript code on the github will be able to extract your local IP addresses of your device. In case of IPv4 the code will typically extract your internal network IP address, which is in most cases is a NAT (Network Address Translation) address, eg. 192.168.x.x or 10.x.x.x. However, with IPv6 the need for NAT disappeared. That means in many cases this technique reveals your true IPv6 address, and thus your true location, even though you are connected through a VPN, and behind a router/ firewall.

Professional fraudsters and bot developers know this and thus use in most cases good quality residential proxies or in some cases VPNs. At the client side they will override the specific WebRTC APIs and spoof the returned values to whatever IP address the proxy server or other end of the VPN tunnel has. That way all IP addresses read (client and server) will match and the total session will look genuine. They also will rotate IP addresses quickly to stay below the ‘IP address rate limiting’-radar.

领英推荐

The $0 Browser Automation Secret: How Google Gemini…

Julian Goldie 1 个月前

OneTrust Implementation in Adobe Launch

Pradeep Jaiswal 9 个月前

?? Train Your Own Spam Email Classifier Today! ??

Kengo Yoda 1 个月前

9. Can bots solve CAPTCHAs?

With the rise of AI you would expect that bots will be able to solve all CAPTCHAs automatically. That is correct up to a certain degree [14][15]. Images containing text, or simple image recognition is achievable with high accuracy. Figure 10 shows examples of CAPCHAs which can be solved automatically.

Figure 10. Text CAPTCHAs do not deter bots. This type of CAPTCHA can be solved automatically as bots are able to read using OCR (optical character reading)

More recent CAPTCHAs have become puzzles based on knowledge, where you need to have some subject knowledge in order to solve the CAPTCHA. Sometimes they even resemble an IQ test. Figure 11 contains a few example CAPTCHAs that require general knowledge, eg. animals that lay eggs, the usage of objects eg. vehicles on paved roads, and/or the monetary value of goods.

This type of CAPTCHA along with the sliding CAPTCHA type in Figure 12 are not that easy to solve automatically. The knowledge questions are not that hard to program, but they rotate quickly. Both questions and the individual images shown in the CAPTCHA constantly change requiring general knowledge. Also the individual images are purposely dithered and/or made of poor quality and/or low contrast, see the paved road example in Figure 11 . As it is economically feasilble to rent CAPTCHA solving services it doesn’t stop bots at all. These solving services will only slightly increase the price of the bot operation, but if it still works out economically; it still works out and continues.

Over time AI will become cheaper, better public models will become available enabling better text interpretation and image recognition. It is just a matter of time that bots are able to solve the more complex puzzles.

Examples of CAPTCHAs that are not based on OCR, but require a deeper level of interpretation and knowledge to solve — Figure 11. Examples of CAPTCHAs that aren't based on OCR, but require a deeper level of interpretation and knowledge to solve

Figure 12. Animated examples of the sliding CAPTCHA.

10. Can bots fake human interactions like mouse movements, clicks, scrolls and/or touches ?

When a browser is controlled by browser automation software it is able to move the mouse to new locations. In CDP (Chrome Devtools Protocol) mouse movements, clicks and scrolls are controlled by dispatchMouseEvent [16]. This enables a developper to fully control the mouse and its buttons and wheels. The same accounts for touch events by using dispatchTouchEvent, which can be used to emulate mobile behavor [17].

Sending mouse or touch events over time to a remote controlled browser is the simplest form of simulating behavior. Such programmatic series of events will not (yet) resemble human behavior, because humans don’t move their mouse in straight lines and humans don’t click on the exact same pixel multiple times.

Figure 13. More complex mouse movements can be simulated with bezier curves or b-splines. The simulated mouse movements shown in this figure are made by mouse synthesizer[19]

In order to simulate human behavior humanlike mouse paths must be generated instead of straight lines. This can be achieved using b-splines [18] or bezier curves. The software is able to generates a series of X,Y points based on a starting and an end-point based on the coordinates of elements in a page. The second step is to calculates a spline curve and timestamps how fast the mouse should move from the starting coordinate to the destination coordinate and at what time resolution. This technique enables fraudsters to perform humanlike mouse movements. This is exactly what mouse synthesizer [19] (see Figure 13) and ghost cursor do [20]. But, don't worry, of course this can be detected as no human is able to make perfect round curves using a mouse.

So, can bots fake interactions? Yes, they can!

Conclusion

You might think why didn’t you write something about blacklisting IP addresses? If you’re able to create a sophisticated bot able to buy #taylorswift tickets then you’ll KNOW how to spoof, fake and/or emulate browser functionalit. You are well aware that you have to use residential proxies, which means IP blacklisting will cause false positives. Filtering on IP addresses only works to exclude traffic from outside your country, but don’t forget if you’re in the US to include overseas territories like: Guam, American Samoa, Virgin Islands, etc.

You are aware that bots are able to fake (almost) anything. They can’t fake dynamic WebGL / GPU challenges but they will poison these challenges with noise in order to hide their headless appearance. CAPTCHAs do work to a certain degree, but they do annoy humans and thus cause friction to their journey.

So, can these sophisticated bots be detected? Sure they can. Once you know how bots override properties, fake answers, and hide their true appearance you know where and what to look for. Lastly, browser automation will cause “browser automation”-leakage. Spotting these leakages and traces of automation will reveal the true nature of the bot accessing your website.

2024-07-11

Questions? Corrections? Remarks? Need help with bots and fraud? Feel free to connect, comment or DM

#adfraud #bots #CMO #digitalmarketing #browserautomation #clickfraud

[1] https://developer.mozilla.org/en-US/docs/Web/API/Location

[2] https://developers.google.com/admob/ios/browser/webview/api-for-ads

[3] https://developers.google.com/admob/android/browser/webview

[4] https://elie.net/static/files/picasso-lightweight-device-class-fingerprinting-for-web-clients/picasso-lightweight-device-class-fingerprinting-for-web-clients-paper.pdf

[5] https://cdn.elie.net/static/files/picasso-lightweight-device-class-fingerprinting-for-web-clients/picasso-lightweight-device-class-fingerprinting-for-web-clients-slides.pdf

[6] https://privacybadger.org/

[7] https://en.wikipedia.org/wiki/Cipher_suite#TLS_1.0%E2%80%931.2_handshake

[8] https://github.com/lwthiker/curl-impersonate

[9] https://github.com/Danny-Dasilva/CycleTLS

[10] https://github.com/Noooste/azuretls-client

[11] https://en.wikipedia.org/wiki/WebRTC

[12] https://github.com/diafygi/webrtc-ips

[13] https://browserleaks.com/webrtc

[14] https://arxiv.org/abs/2307.12108

[15] https://arxiv.org/abs/2307.10239

[16] https://chromedevtools.github.io/devtools-protocol/tot/Input/#method-dispatchMouseEvent

[17] https://chromedevtools.github.io/devtools-protocol/tot/Input/#method-dispatchTouchEvent

[18] https://en.wikipedia.org/wiki/B-spline

[19] https://github.com/MIMIC-LOGICS/Mouse-Synthesizer/tree/main

[20] https://github.com/Xetera/ghost-cursor

[21] https://www.mimic.sbs/antibot/On-Anti-Bot-Biometric-Protections.md/

[23] https://www.dhirubhai.net/posts/kouwenhovensander_taylorswift-adfraud-adfraud-activity-7199384350369431552-t8EZ

[24] https://github.com/salesforce/ja3

[25] https://chromestatus.com/feature/5124606246518784

[26] https://github.com/yifeikong/curl_cffi

Deepankar Biswas

CEO & Co-Founder at ClearTrust | Score your traffic

8 个月

Lovely post and details Sander Kouwenhoven .

1 次回应

Timothy "Tim" Hughes 提姆·休斯 L.ISP

Should have Played Quidditch for England

8 个月

Great article Sander Kouwenhoven shared on X

1 次回应

Dr. Augustine Fou

FouAnalytics - "see Fou yourself" with better analytics

8 个月

Yes! Bots can fake anything. Nice FAQ

1 次回应

Sander Kouwenhoven

8 个月

In addition to the 10 questions in this FAQ I got this question in my DM: "Can a bot fake ... [CTV] video ad quartile completion events?" Video ad quartile completion events are pixels fired from the browser at 25%, 50%, 75% and completion of the video advertisement. These events are nothing more than loading a pixel with a payload describing which unique id,?advertisement, campaign, publisher, etc, etc in order to tie back the viewability and completion rate of video advertisements. As you can read in the article, bots can generate any type of HTTP(S) request and thus are able to generate video completion requests incl payload as well.?

1 次回应

Michael M. M.

Ad-Fraud Investigator & Media Expert, member of Digital Forensic Research Lab cohort "Digital Sherlocks" - Adding some fun when asking unexpected questions you were not prepared to hear

8 个月

People who can use ChatGPT4o and are able to define a valid prompt, are also able to "build" a script who is able to do all listed tasks. Generating fake traffic has become very easy.

查看更多评论

要查看或添加评论，请登录

Sander Kouwenhoven的更多文章

Zero trust security in digital advertising

2024年12月5日

Zero trust security in digital advertising

Q: What happens if you leave a loophole in online fraud detection? A: It will surely be exploited! In 2024 IT security…

5 条评论
Human operated fraud in lead generation

2024年11月21日

Human operated fraud in lead generation

Human operated fraud uses tools to circumvent detection technologies that would normally block abuse, spam, fraud, etc.…

4 条评论
How to make money using fake browsers?

2024年11月14日

How to make money using fake browsers?

How to write a bot in 100 lines of code that: Fires a prebid Receives the winning bid: the advertisement Passes fraud…

11 条评论
Why do fraudsters use synthetic and fake browser fingerprints?

2024年11月7日

Why do fraudsters use synthetic and fake browser fingerprints?

Device and/or browser fingerprinting. Big Tech prohibits it [4][5][6][7][8].
Are proxy servers the root of all evil in ad fraud?

2024年10月17日

Are proxy servers the root of all evil in ad fraud?

Imagine an Internet without proxy servers. It would imply that all traffic originates from its own IP address.

4 条评论
How to make money using fake Android apps?

2024年10月10日

How to make money using fake Android apps?

Ad fraud. In this case App fraud.

3 条评论
Part 2 of Foundations of Fighting Ad-Fraud - Analytics

2024年10月3日

Part 2 of Foundations of Fighting Ad-Fraud - Analytics

Ad fraud. A problem ignored by many in the digital advertising ecosystem.
Foundations of Fighting Ad-Fraud – Part 1

2024年9月26日

Foundations of Fighting Ad-Fraud – Part 1

Ad fraud. Collectively costing the industry billions/ year, and apparently nobody gives a sh*t.

6 条评论
How to scale your lead generation campaigns without scaling fraud and its risks

2024年6月13日

How to scale your lead generation campaigns without scaling fraud and its risks

How to scale your lead generation campaigns without scaling problems such as fraud, TCPA violations and its risks? Most…

1 条评论
How does bot and human operated fraud work?

2024年5月23日

How does bot and human operated fraud work?

According statista.com the number of bots in 2022 on the Internet was ~50%.

3 条评论

See all articles

In 2024 "Can a bot fake .... ?" -- FAQ

Sander Kouwenhoven

1. Are bots able to fake the domain name of the website they are accessing?

2. Are bots able to fake the referrer URL?

3. Are bots able to fake the Mobile App name?

4. Can bots fake utm_ parameters and other forms of link decoration?

5. Can bots fake cookies in the browser?

6. Can bots fake the fingerprint of the browser?

7. Can bots fake the TLS fingerprint of the browser?

8. Can bots fake (prevent) WebRTC from leaking your real IP address?

WebRTC Server side

WebRTC Client Side

领英推荐

9. Can bots solve CAPTCHAs?

10. Can bots fake human interactions like mouse movements, clicks, scrolls and/or touches ?

Conclusion

Sander Kouwenhoven的更多文章

社区洞察

其他会员也浏览了

??Season's greetings from Apify ??

Beyond Keywords: Your Enterprise Survival Guide to AI-Powered Search

The Secret Power of Virtual Agents

Of Evil Bots and Good ones

What is device fingerprinting?

Hard Coding vs. Parameter Tagging in Google Tag Manager: Navigating the Best Path for Your Tracking Strategy