登录查看更多内容

Protecting Your Fraud Detection Code

Oxford Biochronometrics

Intersecting e-commerce & cybersecurity to optimize customer acquisition costs

发布日期: 2023年4月17日

The previous two articles described the individual layers of the technology stack of a web browser and more how these layers can be exploited to create an unique web fingerprint, and how to avoid being fingegrprinted. The first article describes how residential proxies are quickly rotated to avoid IP address blacklists and how the back-connect architecture avoids these proxies from being detected by port scans. The second article describes the layers which are bound to the OS, network stack, TLS, WebGL, user preferences and JavaScript.

JavaScript is used to extract property values directly from the browser, aka first degree fingerprint values. These values can be overridden easily by fraudsters and therefore have not much value other than to create fake marketing fingerprints. However, second degree fingerprint values are based on the output of executing code in the browser, eg. calculating a hash of graphics rendered on the canvas, etc. which are much harder to spoof. The reason it is harder is because the ‘question’ lies within the JavaScript code and the ‘answer’ in the implementation of the function within the browser and/or GPU. In such a setup fraudsters need to parse and understand the JavaScript code in order to know which answer to return when overriding the function.

As all client side developers will confirm: JavaScript code is sent in plain text to the browser, where it is parsed and executed. This means, the JavaScript source code can be read by anyone! Including fraudsters, crooks and people with odd hobbies like reverse engineering. This means that ’fingerprint questions’ in the code can be read and understood by simply finding it in the JavaScript code. And that’s not what you want, because it enables fraudsters to ‘generate and return’ their preferred correct answer matching the question which enables their bots to bypass fraud detection rather than executing the JavaScript code. So, how do you protect your JavaScript code from fraudsters reverse engineering?

Minification

The easiest and simplest way of protecting your JavaScript with all its secrets is to minify the code, for example UglifyJS [1]. This makes is a bit harder to read, because all variables and function names are reduced to a single letter, all spaces, tabs and line feeds are removed. The original goal of minification is to reduce the size of the script, though some legacy ad fraud vendors still think that minification and string encryption (coming up next) is enough to protects their JavaScript against reverse engineering. Unfortunately, they’re wrong!

No alt text provided for this image — Fig 1. Simple JavaScript code example to be minified

The two Figures above show the input and the output of minification for a simple JavaScript example. The result is that the minification removed the function as it is called only once, changed the variable names and put everything on a single line.

String extraction and encryption

In JavaScript a property can be accessed using the dot notation: window.webdriver. Another way of accessing the exact same property is using the bracket notation: window[“webdriver”], for more info see this link [2]. This opens an opportunity, to replace the string in the bracket notation with the output of a function, as long as the function returns ‘webdriver’. This means the function may have an integer as input being used as an index into an array with the encrypted strings, or the encrypted string itself where the function decrypts the string, or a dedicated function per string, etc. Such a notation will be: window[somefunction( … )]. This makes it harder to immediately scan and read the source code and quickly determine where the crownjewels are within the JavaScript.

The most basic string decoder example is shown below. It replaces each input character with its preceding character, eg. z becomes y, y becomes x, d becomes c, b becomes a, etc. In the real world this works with arrays and indexes and rotating decryption keys.

As mentioned, the combination of minification and string encryption is what some legacy ad fraud detection companies use to protect their code. Their implementation of string protection is similar to ROT13 which was also used by Julius Caesar to encode his messages [3]. Proven technology FTW, for over 2000 years! Instead of ROT13 these vendors have implemented a ROT47 to protect the strings in their code. This ROT47 version is able to encode and decode a total of 94 characters. The wikipedia states [3]: “The algorithm provides virtually no cryptographic security, and is often cited as a canonical example of weak encryption.”

Obfuscation

A more sophisticated way of protecting the JavaScript is using an obfuscator tool like obfuscator.io [4]. This is an open source tool to obfuscate JavaScript in order to make it harder to copy and prevent people from stealing your work.

This obfuscator will read your original JavaScript code and through a series of transformations, such as variable / function / arguments renaming, string removal, and others, your source code is transformed into something unreadable, while working exactly as before.

In Figure 4 can be seen how a small snippet of code is translated and bloated to a seemingly unfathomable piece of code. However, using the default settings the original function name ‘hi’ is still present in the obfuscated code, the the red lines at bottom of Figure 4. Luckily, the obfuscator has many configurable options and the last remnant (function name hi) has been replaced with random function name, which can be seen in Figure 5, which also shows that the names in the code are uniquely re-generated.

This is a much better way of protecting the fingerprint questions compared to a simple minification and ROT47 string protection, and the tool is open source. The obfuscated code can still be executed in the browser and thus by using the debugger you can walk through the code step by step. It is clear that this requires much more time, effort, patience and knowledge to reverse engineer the code.

Run a JavaScript implemented VM

In order to protect your secret fingerprint questions in the JavaScript code, the code is compiled from JavaScript to a proprietary version of byte code, for more info on bytecode: [5][6]. The bytecode is an abstraction of machine code and each bytecode is a small building block. It is a binary format, and thus not human readable like text. Together these bytecodes can make up any JavaScript functionality and thus represent the original JavaScript code. Because of the binary nature a series of bytecode instructions will be efficiently executed by a software interpreter.

To execute this bytecode, the bytecode needs a JavaScript wrapper, ie. a software interpreter aka a virtual machine (VM) implemented in JavaScript. This VM is able to decode the instructions and translate these to the native JavaScript calls, expressions and operators. Subsequently, it executes these commands, returns the output (or throws an error), keeps track of the state of the VM, stack, etc.

The goal is to translate some input JavaScript with fingerprint questions to its bytecode equivalent, and add a JavaScript wrapper which runs this bytecode similar to a VM. When someone downloads this JavaScript, the code is executed and will start the VM, which will start to interpret and execute the bytecode. By doing so, the VM effectively runs the web fingerprint code.

Ofcourse several protections can be added to make the life of a reverse engineer more miserable and horrible, but the essense is to abstract the crown jewels in bytecode which can only be understood when understanding how the VM works. In this case you’re not able to create abstract syntax trees, run scripts to untangle the tangled code, or to go from boiled tangled spaghetti back to straight spaghettis.

Now, the underlying question is: Is compiling your original source to bytecode and a VM a good mechanism to prevent reverse engineering?

You could try to debug the code in order to understand what first and more importantly second degree questions the fingerprint code executes. But, when using a debugger you will be mostly debugging the virtual machine and not the bytecode. Again, it is not impossible to understand what is going on, but it is harder and takes even more effort, time and knowledge to reverse engineer code like this. To stay in the pasta analogies: Chopping up your spaghetti to vermicelli tagliati makes it way harder!

Reverse engineering protected code?

So, how do the professional fraudsters solve this problem? As some of their bots are clearly still able to bypass commercial bot detections and protections.

AST explorer

The first set of tools in their toolkit is Babel and AST explorer, to translate the code into a syntax tree [8][9]. These tools are able to parse minified and obfuscated code into a syntax tree and output in a ESTree format [10].

So, how does this look? Let’s start with obfuscating the same code example, again using obfuscation.io. In Figure 6 a function hi() is created which prints a line of text on the console. This code is to be obfuscated in order to make it unreadable and harder to grasp what is going on.

Figure 7 below shows the output and it looks like spaghetti code, not just because of the formatting, but also the names starting with _0x and 0x (without the underscore) are annoying to read as a human. You’ll also see that a lot of structure is added to achieve the same, which makes it more difficult to trace the real intent of the code. Though, it’s not impossible to reverse engineer code like this, but it surely is more work than a simple minified piece of code with a ROT47 string protection.

When fraudsters want to bypass your detection in order to buy tickets, generate leads, whatever they do, their starting point is: An obfuscated piece of code. The first steps are to recreate and understand the structure and flow of the code, and replace encrypted strings and integers to their original values. This can be achieved with, for example, AST explorer [8] and/or babelJS [9].

First, let’s take a look at AST explorer. The name comes from Abstract Syntax Tree (AST) which is a hierarchical method to show the relations of functions, objects, variable scope, etc in the JavaScript code. Let’s see how the obfuscated code looks in the AST explorer.

The next step is to programmatically recompute the constants, string decryption, reducing indirection, reducing nestiness, etc. This can be achieved with babelJS which enables you to traverse the AST tree and programmatically add/ remove/ replace/ change nodes meaning changing the obfuscated JavaScript code.

You will have to write a piece of code in order to decode the obfuscated code. This goes step by step and the output will not be 100% the original source code. But, when doing it correctly the best you can get is a minified version of the original code, which is perfectly fine for a reverse engineer.

If you want to read more about deobfuscating code using ast explorer and babel: This blog has some very nice step by step examples: [11]

Trying to reverse engineer a JavaScript with the fingerprint code converted to bytecode is wayt harder, as you will have to write your own tools. These VMs are proprietary and thus one of a kind and no (public) tools exist to reverse or transcode the bytecode back to JavaScript in order to extract the fingerprint questions, and thus generate the answers and continue as normal.

Questions? Recipes? Leave a comment or send a DM

[1] https://www.uglifyjs.net/

[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Property_Accessors

[3] https://en.wikipedia.org/wiki/ROT13

[4] https://obfuscator.io/ and https://github.com/javascript-obfuscator/javascript-obfuscator

[5] https://en.wikipedia.org/wiki/Bytecode

[6] https://medium.com/dailyjs/understanding-v8s-bytecode-317d46c94775

[7] https://github.com/v8/v8/blob/master/src/interpreter/bytecodes.h

[8] https://astexplorer.net/

[9] https://babeljs.io/docs/babel-parser

[10] https://github.com/estree/estree

[11] https://www.trickster.dev/