x64 Assembly
Female head | Leonardo da Vinci

x64 Assembly

Something that I have gotten really into recently is x64 Assembly programming. So, I thought I would jot down some of the notes that I’ve collected from developing in the language. I am using the?MASM?assembler in a?Visual Studio?environment as their memory, registers, and debugging tools work well for my needs.?Note: I will use routine, sub-routine, procedure, and function interchangeably, but they all effectively mean the same thing.

Register quick tips

Below is a quick reference table of registers and their purpose. This isn’t 100% accurate for all the purposes of each register, but it is good enough to get started. Note that all registers in a row are the same register, just the lower bits as you move to the right of the row. So,?64-bit?is the whole register,?32-bit?is the lower half of the register,?16-bit?is the lower half of 32, and?8-bit?is the lower half of 16.?Note: there are high registers (ah, bh, ch, dh), but not for the newer registers (r8-r15) so we are skipping those for now to make this table loop pretty.

There are some registers that have special behavior based on the instruction that you are using.?RCX?combined with the?loop?instruction is one such set. Below is an example that will increment the value in the?rax?register 8 times.

mov rcx, 8
loop_8_times:
inc rax
loop loop_8_times        

As you can see, the?loop?instruction basically decreases the?rcx?register until it reaches the value of 0. When?rcx?contains the value of 0 then the loop will end. Below is an example of the same code without the loop instruction to describe the behavior.

mov rcx, 8
loop_8_times:
inc rax
dec rcx
cmp rcx, 0
jnz loop_8_times        

Fast-call procedure calling conventions

First of all,?this document?is very helpful for understanding Microsoft calling conventions.

In short, Microsoft uses ECX, EDX, R8, and R9 as the first four arguments for a procedure call and any remaining arguments should be pushed onto the stack. Below is a sample from their docs:

func1(int a, int b, int c, int d, int e);
// a in RCX, b in RDX, c in R8, d in R9, e pushed on stack        

The following is the calling convention for using floats as arguments to functions. Note, if you mix input arguments, you should still be using the order described in the samples. That is to say if you have an?int?as the first argument and a?float?as the second argument, you should use?RCX, XMM1?respectively.

func2(float a, double b, float c, double d, float e);
// a in XMM0, b in XMM1, c in XMM2, d in XMM3, e pushed on stack        

Lastly, when calling a procedure, the return value for the call (if any) will be put into RAX.

Fast-call procedure shadow space (home space)

When using fast-call it is important to note that if the routine is either to be called from another language such as C or C++, or if you are calling a function that is in another language like C or C++, you need to make sure to support?shadow space?also known as?home space. I’ll call it?shadow space?from now on because it sounds cooler. This shadow space is 32 bytes long (since we are in 64-bit assembly). Basically, what it boils down to is that you need to move the stack pointer?RSP?32 bytes before making a?call?(keep in mind 16 byte alignment of the stack). Let’s look at Microsoft’s?HeapAlloc?function (basically?malloc) as an example of how this would work. Below is our own implementation of?malloc?which we will call?halloc?and use the Windows api function?HeapAlloc.

halloc PROC
	mov r8, rcx		; Add the number of bytes to allocate
	call GetProcessHeap	; Store the process heap address in RAX
	mov rcx, rax		; The heap address is 1st arg
	mov rdx, 00h		; No flags to alter memory allocation
	sub rsp, 20h		; Shadow space
	call HeapAlloc
	add rsp, 20h		; Remove shadow space
	ret
halloc ENDP        

What you will notice in the above code is the instructions?sub rsp, 20h?and?add rsp, 20h?which are adding and removing the shadow space respectively. This is a little bit annoying, but I personally don’t require the shadow space when I am calling routines that I don’t intend to expose to a higher-level language like C. This means that I mainly only have to add it when I am calling into a function that I would like to use from the higher-level language library. For a short-added reading on this, check out this Microsoft?blog post.

Something I like to do is to have a routine for doing shadow space calling for me. Basically, you pass the function you want to create shadow space for calling into?rax?(in my case) and then you add and remove the stack space around the call as you normally do.

;*********************************************;
; RAX = Function that should be shadow called ;
; Returns whatever the function call returns  ;
;*********************************************;
shadowCall PROC
    pop rbx	     ; Get the return address in a non-volitile register
    and rsp, not 8   ; Make sure that the current stack is 16-byte aligned
	sub rsp, 20h ; Add the shadow space
	call rax     ; Call the function
	add rsp, 20h ; Remove the shadow space
	jmp rbx	     ; Go back to the stored instruction address
shadowCall ENDP        

The above instructions have a few things going on. The most interesting thing that is going on is that we do?pop rbx. The reasoning for this is because we don’t want our return address to be part of the shadow space as it might get overwritten by the external function. So, we need to remove it from the stack and store it in a non-volitile register to return with later. The second thing is that we are using?and rsp, not 8. This just makes sure that the stack is 16-byte aligned before it does the external call, otherwise you’ll probably wind up with a memory access violation.

Shadow space and function arguments

At this point you might be wondering, if there are more than 4 arguments to a function call and the remaining arguments are put onto the stack, how does this work with shadow space? Since the fast-call calling convention requires the shadow space (whether or not it uses it) and that alters the stack, your question should be, “do I push to args to the stack before or after adding the shadow space?”. The answer is to push the args?before?you add the shadow space.

mov rcx, 1		; Arg 1
mov rdx, 2		; Arg 2
mov r8, 3		; Arg 3
mov r9, 4		; Arg 4
push 5			; Push the 5+ arguments onto the stack first
sub rsp, 20h		; Shadow space
call someFunction
add rsp, 20h		; Remove shadow space        

Stack 16 byte alignment

Something I am aware of, but honestly haven’t fully explored, is that the stack is on a 16-byte alignment. That is to say that if you are to push only 1 8-byte value onto the stack, you should pad it by adding the other 8 bytes. You could push the value 2x or, more preferably, just move the stack pointer. Below is an example of this exact scenario.

mov rax, 99 ; Some value from somewhere
push rax    ; Push an 8-byte value onto the stack
sub rsp, 8  ; Move the stack pointer by 8 bytes to keep it 16-byte aligned        

Often, you’ll want to start your program off on the right foot by aligning it. Believe it or not, it doesn’t always start off aligned.

.code
main PROC
    and rsp, not 08h    ; Make sure that the stack is 8-bytes aligned
    ; ...
main ENDP
END        

Setting up a x64 only project in Visual Studio

You will need to create a C++ project as you normally would. Though you are selecting this to be a C++ project, we will not be creating any C/C++ file types, we will only be creating?.asm?files.

Make sure to give your project a suitable name during the configuration step.

Something that I like to do is get rid of the normal Visual Studio solution explorer folders and just show all files so that I can setup the directories how I want to set them up.

Next, we need to enable the MASM assembler in the build customizations

Now let’s create a?src/main.asm?file to make sure things are setup correctly. When you create the file, right click on it, and go to the file’s properties.

You should see that the file type is set to?Microsoft Macro Assembler.

Next, you need to set the label that will serve as your entry point in the Visual Studio project properties. To keep things simple, we will name our entry point label?main. So, to set this up you need to go to project properties.

Then you need to go to the Linker->Advanced settings and set the?Entry Point?value to?main.?Note: Make sure that you are in x64 mode and not x86.

Now that you have done all that setup, turn your debugger to x64 mode (through the dropdown in Visual Studio next to the debug button) and test things out.

NOTE:?if you are getting an error when building some-time in the future that says something along the lines of?unresolved external symbol imp_CrtDbgReportW, the problem seems to be the multi-threaded debugging runtime library setting. Changing from “Multi-threaded Debug DLL (/MDd)” to “Multi-threaded DLL (/MD)” in the visual studio project settings seems to have done the trick. You can find it in?Project Settings->C/C++->Code Generation->Runtime Library.

Code examples

What better way to learn something than through some code examples. Below are some ASCII string query routines that I have written in x64.?Note: these routines are slower, but it works good for example’s sake. I use a faster version of these routine in my personal code that account for cache lines and heap access.

;****************************************;
; RAX = The string to get the length for ;
; Returns length of string in RAX        ;
;****************************************;
strleninline PROC
    push rbx		  ; Save state of rbx since we are going to use bl
    push rcx		  ; Save state of rcx since we are going to use bl
    mov rcx, rax          ; Create a copy of rax to diff at end
strleninline_loop:
    mov bl, [rax]         ; Copy ascii letter at the rax address into bl
    inc rax               ; Go to the next ascii letter at rax
    cmp bl, 0             ; Check to see if the character is a \0
    jnz strleninline_loop ; If not \0 then continue through the loop
    dec rax               ; Don't want to count \0 as part of the length
    sub rax, rcx          ; Length in rax by subtracting address locations
    pop rcx               ; Restore the state of rcx
    pop rbx               ; Restore the state of rbx
    ret
strleninline ENDP        

strstartswith?- Determines if a string (haystack) starts with another string (needle)

;*******************************************************;
; RAX = Needle string (string should be in start)       ;
; RBX = Haystack string (string to check within)        ;
; Returns 0 in RAX if false, anything otherwise is true ;
;*******************************************************;
strstartswith PROC
    push rcx                ; Save the state of rcx
    push rdx                ; Save the state of rdx
    mov rdx, rax            ; Copy rax to rdx as we are calling strlen
    call strlen
    mov rcx, rax            ; Move len of needle str into counter register
    mov rax, 0              ; Set the return to false
strstartswith_loop:
    mov r8b, [rbx]          ; Get the character from haystack string
    cmp r8b, [rdx]          ; Compare character from the needle string
    jnz strstartswith_exit
    inc rbx                 ; Move to next character in haystack string
    inc rdx                 ; Move to the next character in needle string
    loop strstartswith_loop
    mov rax, 1              ; The string starts with match!
strstartswith_exit:
    pop rdx                 ; Restore the state of rdx
    pop rcx                 ; Restore the state of rcx
    ret
strstartswith ENDP        

strindexof?- Get the index of a string (needle) within another string (haystack)

;*******************************************************;
; RAX = Haystack string (string to check within)        ;
; RBX = Needle string (string should be in start)       ;
; Returns -1 in RAX if not found, otherwise RAX = index ;
;*******************************************************;
strindexof PROC public
    push rcx        ; Save the state of rcx
    push rdx        ; Save the state of rdx
    push rax        ; Save the haystack to the stack
    push rbx        ; Save the needle to the stack
    mov rdx, rax    ; Copy rax to rdx as we are going to call strlen
    call strlen
    mov rcx, rax    ; Move len of the haystack into our counter register
    mov rax, rdx    ; Set the found address to the starting address
    dec rax         ; Make it so that sub rax, haystack will be -1
    cmp rcx, 0      ; Check to make sure we are not looping a 0 string
    jz strindexof_exit_loop
strindexof_loop:
    mov r8b, [rdx]  ; Get the character from haystack string
    cmp r8b, [rbx]  ; Compare character from the needle string
    jne strindexof_notfound
    mov r8, [rsp+8] ; Get the haystack from the stack without popping
    cmp rax, r8     ; See if rax has already been set, otherwise set it
    jge strindexof_check
    mov rax, rdx    ; rax is -1 from haystack address, so it needs to be set
strindexof_check:
    inc rbx         ; Go to the next letter in the needle
    mov r8b, [rbx]  ; Get the character code for the next letter in needle
    cmp r8b, 0      ; If it is the 0 string terminator, then we need to end
    jz strindexof_exit_loop
    jmp strindexof_continue
strindexof_notfound:
    pop rbx          ; Reset the needle to it's starting address
    pop rax          ; Reset rax to haystack starting address
    push rax         ; Put the value back onto the stack for the haystack
    push rbx         ; Push needle starting address back onto stack
    dec rax          ; Make it so that sub rax, haystack will be -1
strindexof_continue:
    inc rdx          ; Move to the next character in haystack string
    loop strindexof_loop
strindexof_exit_loop:
    pop rbx          ; Remove the stored neele address as it isn't needed
    pop rdx          ; Reset the haystack pointer to beginning of string
    sub rax, rdx     ; Get address difference of the needle and haystack
strindexof_exit:
    pop rdx          ; Restore the state of rdx
    pop rcx          ; Restore the state of rcx
    ret
strindexof ENDP        

?

要查看或添加评论,请登录

Brent Farris的更多文章

  • Notes on Writing a Ray Tracer using CUDA

    Notes on Writing a Ray Tracer using CUDA

    What took the CPU (1 thread) over 30 seconds to render, I was able to get CUDA to render in ~20 milliseconds. There are…

  • Go Generate and AST

    Go Generate and AST

    I recently have been playing around with Go’s built in generator for the standard go compiler. I love the idea of a…

  • BIOS Programming in VirtualBox

    BIOS Programming in VirtualBox

    There are three things I want to explain how to do here, (1) basic BIOS interrupts, (2) reading keyboard input, and (3)…

    1 条评论
  • The rise of programming specialists

    The rise of programming specialists

    The more software developers (programmers) that I see enter the job market, the more I see programmers isolating…

社区洞察

其他会员也浏览了