Stack Smashing Protection
This material is always incomplete, and might contain errors. So, I am always ready to accept constructive feedback on this material.
In this short article we will discuss one of the most common problems in embedded software development and its resolution. After completing the theory, a practical example will be presented.
Content:
- Introduction
- Stack overflow issue
- Stack buffer protection
- Practical example
- Conclusion
1. Introduction
As you might know, the Stack is very common data structure used in most of programming languages. As its name, the stack behaves exactly as the real world stack where you can use it from one side only. So the last element you added must be pulled first. that is why it is called last in first out queue (LIFO).
In the above picture, the stack was used to add (so called pushing) A, B, C then D in the same order. On the other hand to pull (so called popping) elements from the stack we have to follow the reversed order D, C, B then A.
The same concept is present in different CPU architectures to be used with function call at rut time. Where the end of the RAM memory is used as a stack that will be considered to store the local objects for each function that is called/being executed as the following:
When function A is called the stack pointer (SP special register inside the CPU) will be decremented (move upward) by a certain number of bytes which is called stack frame according to the local objects in function A. Another stack frame will be allocated (moving SP upward) on top of function A's frame when B is called and the same scenario will be for C.
On the other hand, when function C returns the SP will be incremented (move downward) by the same amount that was allocated for C and the allocated frame for C (the red one above) is not available anymore. The same scenario will be for B and A frames when they return.
2. Stack overflow issue
Stack overflow, smashing or stack corruption is a scenario might occur at the runtime when some memory locations at the stack are accessed unintentionally. This might be due to different reasons for example:
- out of range accessing for an local array object
- undefined pointer dereferencing when the pointer is storing a stack address (garbage)
In the previous figure, if function B run and accessed some memory locations outside its ownership (the yellow frame is owned by B) then the former frame (the orange one) that is owned by function A will be corrupted/overwritten by wrong. As a result, the program will have undefined behavior when B returns and function A is being completed because of corrupted data usage.
3. Stack buffer protection
After talking about the problem, lets discuss the software based resolution. The main idea behind the protection mechanism is very simple which is to inject a fixed value (so called canary value) just after the stack frame of the routine just before being executed and check/confirm that the value is unchanged just before the routine return point. Of course the stack frame will be more than the regular function stack frame size due to the canary value added.
Just before the return point of the routine the canary value will be verified and if the value is changed a certain callback will be triggered as a notification otherwise the routine will return as usual.
Because the compiler is responsible for generating the needed opcode upon the function call and return. The injection and verification will be handled automatically without changing the application software. The software designer will provide only the canary value and the callback implementation that are required by the compiler when the stack protection is configured.
4. Practical Example
In this simple example we used IAR embedded workbench toolchain (iccarm & ilinkarm) and MSP432P401R MCU which is based on 32-bit ARM Cortex-M4F CPU.
We have 2 modules: main.c and stack_guard.c. The main function will call BufferIncrement routine 2 times.
int main(void){ BufferIncrement(src, 10); BufferIncrement(src, 20); return 0; }
The routine will copy the input buffer src to its local buffer dest which is a simple array of uint32 according to the size parameter and increment each entry by one .
void BufferIncrement(uint32_t* src, uint8_t size){ uint8_t index = 0u; uint32_t dest[10] = {0u}; for (index = 0u; index < size; index++){ dest[index] = src[index] + 1u; } }
The input array src is a simple global array of 20 uint32.
static uint32_t src[20] = {0u}; /* Global buffer */
In stack_guard.c we have to define the stack canary value and the notification implementation. Both of them are needed by the complier when it is configured to generate stack protection opcodes. This might be done using the following compiler flag
--enable_stack_usage
The stack_guard.c implementation is very simple.
extern uint32_t __stack_chk_guard = 0xDEADBEEFu; /*Canary value*/ void __stack_chk_fail(void) { /*perform soft reset */ RSTCTL->RESET_REQ = RSTCTL_RESET_REQ_SOFT_REQ|RSTCTL_RESETREQ_RSTKEY_VAL; }
Mainly we declared/defined the canary value __stack_chk_guard and perform a soft-reset in the stack overflow callback __stack_chk_fail. The soft-reset is performed according to the reset controller specification in the MCU data sheet:
Run-time behavior
The disassembly view shows the function calling generated opcode as the following:
where you can easily notice the R0 and R1 are holding the address of src and the immediate value of the second parameter #10.
After executing the branch instruction the CPU will push some registers that are related to the previous context and you can notice that the regular stack frame (10 of uint32) of the routine is allocated in addition to one more entry to reserve the canary entry (1 of uint32) and the SP will be moved upward/substracted by 11 * 4 bytes which is 44 or 0x2C.
The stack canary value __stack_chk_guard then will be read and stored into R0. The last instruction will move that value to be at the end of the stack frame (ie. SP + 0x28). That is the canary location as in the below figure.
That is the injection of the canary value before executing the function. Now lets examine the opcodes before returning from the function.
As we can see in the following figure, The previously injected canary value is loaded from the same location (ie. SP + 0x28) into R0 and __stack_chk_guard value will be loaded into R1
Then the CPU will compare R0 and R1. The values are matched in the first BufferIncrement call as the buffer size is matched.
But this is not the case for the second call of BufferIncrement. Where the size parameter is 20 and you can notice that the previous stack frame is being corrupted and overwritten by 1 including the canary injected value (0xDEADBEAF) because of out of range accessing to the local buffer.
In the second call, if you monitor the instructions execution just after the closing brace (ie. the return point) of the function. You can notice that the canary location is read into R0 and __stack_chk_guard is loaded into R1. Then the compare instruction (CMP) will be executed
But the values are not matched in R0 and R1 where the injected canary value is overwritten.
And if we keep stepping till the BL instruction is executed.
Finally, the callback notification will be executed where the proper value is loaded into R0, the reset register address is loaded into R1 and the reset request register will be written (last STR instruction).
Step one more, and we can see the soft-reset response where the program is restarted and program counter is overwritten to start the first instruction before the main function.
5. Conclusion
As discussed the stack corruption problem might be occur when some of the stack memory locations are written unintentionally. The compiler might be configured to generate the opcode that injects the a magic value before executing the function (during the stacking process) and verifying that the magic value is reserved just before the return point. The software engineer is responsible for configuring the compiler, providing the canary value and the stack overflow callback implementation.