The Arts of Assembler: x86 Mutations Basics
Let's talk about an almost extinct species: Assembler. Either you hate it or you love it, but chances are most likely that you try to avoid it at all costs. In my humble opinion, writing pure Assembler code is a kind of art, which many software developers today have sadly forgotten about.
In the past, I had friends who asked me for advice on how to start learning C++ or C. They never got into the Reverse Engineering part on their path, but I recommended to all of them that they really learn the basics of Assembler.
Don't get me wrong, most modern compilers will generate more clever and efficient assembly than we can write, but that's not the primary goal I had in mind when recommending learning Assembler. Personally, I think it is a great benefit to know more background details about your code, which may also deepen your understanding of your programming language.
When driving your car daily, it is good to know why oil plays a big part in your engine, which may cause you to only rev your engine when the oil is already at a specific temperature because you understand the lubricating properties of the oil.
You get where I'm going with this, right?
Anyway let's get back to business. Today's article is all about the beauty of Assembler and especially the tricks you can do with it in Code Obfuscation. Please be aware that a basic understanding of Assembler is recommended for reading the article. If you have any questions I'll gladly answer them. :)
There is a proverb that also fits Assembler: "All roads lead to Rome."
If you don't believe me, I highly recommend a genius and very funny project by Christopher Domas : "The M/o/Vfuscator" - a single instruction C compiler. It only uses the "mov" instruction for everything. Imagine yourself needing to reverse this. Anyway, if you didn't already know about it, I really recommend checking the project out. You can find the slides of the talk here: https://recon.cx/2015/slides/recon2015-14-christopher-domas-The-movfuscator.pdf
During my work on reversing game engines that were obfuscated by DRM or malware, I encountered many different ways assembler instructions could be used to achieve a specific goal, but they were rarely the same. It wasn't as specific as what the Movfuscator did because these programs needed to run on customers' hardware or in production environments.
As always, it comes down to understanding the basics of the things you're working with. In this article, I will focus on x86 Assembler Mutations on Windows. We'll explore ways to choose instructions that can replace a specific instruction while ensuring that the replacement remains semantically identical to the original. Essentially, this is what I mean when I talk about a Mutation.
I initially learned about this when I reversed the techniques of an antivirus software that tried to identify malware based on specific byte signatures. While this heuristic approach may work for some kinds of malware, I can confidently say that if a malware developer is truly obfuscating their malware - such as by using frameworks like Zydis to mutate the binary code - it will be much harder to detect when relying solely on byte signatures. What makes this even more challenging from the antivirus perspective is that most mutations can be applied recursively. However, this often comes with a loss in performance related to the recursive depth of the applied mutations.
Additionally, when working in environments where available space is limited (e.g., payloads or specific hooks), you always want to consider how many bytes your mutated instructions take in memory compared to the original instruction. Often, you are limited in how many bytes you can overwrite or use in general with your mutation, so you need to cleverly combine your assembler instructions. Don't worry - I'll cover this more deeply later on in this article.
0x1 Instruction: mov
Here, we are talking about an instruction that moves an immediate value into a General Purpose Register (GPR). Let's take a look at some examples of these instructions. It's also worth examining them at the byte level to understand how these instructions appear in memory.
Let's discuss some mutations that can be used for a semantically correct substitution of the original instruction.
A simple trick I’ve seen many times involves utilizing the "push" and "pop" instructions. Here, you first push the immediate value onto the stack and then pop it directly into the desired GPR - pretty straightforward. At address 0xD, we can see that these instructions use exactly six bytes in memory, which is larger than the original instruction that only takes five bytes.
Another common approach is to use the xor instruction to set the GPR to zero, particularly when you're unsure of the current content of the GPR. If you can determine the content of the GPR or know that it’s already zero - for example, if there is an existing xor instruction in the original code - you can skip these instructions, making the mutation even smaller.
You can also use a simple add instruction to add the value to the zeroed GPR, which works similarly with the or instruction. These variants will occupy seven bytes of memory space, compared to the original five bytes with the mov instruction. If the xor instruction isn’t necessary, the mutation will be the same size as the original instruction.
The last mutation involves a combination of the or and a shifting instruction. First, you clear the GPR, set the high part of it, shift it to the left, and then set the lower part of the immediate value. This mutation requires significantly more space than the original instruction. Specifically, it takes ten more bytes, totaling 15 bytes.
Among all these mutations, there is one that stands out. Can you guess which one and why?
Spoiler: It's the mutation based on the xor/sub instructions because it doesn’t use the same immediate value as the original instruction.
Why is this important? It may make things less obvious for static analysis, especially when someone is searching for specific constants.
Now, let's explore other mutations that ensure they do not use the original immediate value while still being semantically correct substitutions for the original instruction.
In the first mutation, you clear the GPR, move an XOR-encrypted value into eax, and then decrypt it directly afterward with the final XOR instruction. The total size is 12 bytes, which is significantly larger than the original instruction, which only takes five bytes.
In the second mutation, you push the encrypted constant onto the stack, decrypt it directly on the stack using the XOR instruction, and then pop it into the GPR. The total size here is 13 bytes, which, like the first mutation, is also larger than the original instruction.
Now, I’ll show you how you can combine these mutations in an annoying way.
If you don't have to worry about the size of your binary and don't want to rely entirely on virtualization, this is a valid approach to "protecting" your binary against basic static reverse engineering. Techniques like encrypting imports and/or used strings are also useful strategies to consider.
0x2 Instruction: call
Another interesting instruction to think about is the call instruction in x86. The call instruction essentially pushes the return address, which is needed to return after the call is completed, and then jumps to the address of the called instruction. This behavior can be substituted, so let's discuss some ways to achieve that.
At address 0x0, you can see the original instruction that calls the function located at address 0x13371337. The instruction occupies five bytes of space: the first byte is always 0xE8, and the remaining four bytes represent the address being called.
When you first push the return address onto the stack with a push instruction and then perform a jmp to the "called" address, you can effectively simulate a call instruction. This is demonstrated at addresses 0x5 and 0xA.
The last method relies on a push followed by a ret instruction, as seen at addresses 0xF and 0x14. This method doesn't provide a perfect substitution for the call instruction because the return address isn't automatically set. If you use these instructions, be sure to handle the proper return at the target address where you jump. You might wonder why you would do this, given the need for manual return handling. Essentially, you strip away the explicit return information from your assembly code, which makes it less clear how the code will be executed, thereby complicating the reverse engineering process.
Question: Do you know where searching for instructions where the last instruction is always a "ret" instruction might be important? Write me a comment! ;) (Hint: Data Execution Prevention)
0x3 Conclusion
These were just small glimpses into the world of obfuscation - there is so much more to discuss. For me, this is why obfuscation is both amazing and horrible at the same time. When looking at these mutations, I hope you start to understand the idea behind them and why they can be valuable for certain use cases, such as obfuscation. While these examples are still trivial and deobfuscating them might only require a brief manual analysis or emulation, the content of the GPR will, obviously, remain the same as the original.
If you think of this process in reverse - pun intended - you’ll arrive at the process of optimization. In my opinion, this is quite fascinating because it gives you a better understanding of things like instruction sizes, CPU cycles, and writing performant assembly code. Often, when dealing with obfuscation, I’m reminded of earlier times when these considerations were more critical than they are today, in an era of intelligent compilers and much greater hardware resources. Watching Dimitris Giannakis ?? GDC 's videos on his excellent YouTube channel introduced me to a whole new perspective on how the gaming industry, its hardware specifications, and security measures have evolved over time. Imagine how crucial efficient assembly programming was when trying to fit an entire game onto a small Game Boy cartridge while squeezing the best performance out of the limited hardware. That’s when you either get creative or have to accept a less-than-optimal performance.
Thank you for taking the time to read this article! If you have any questions or comments, feel free to leave them below.
I'm always open for new ideas and opinions, so don't hesitate to contact me! :)