Bare Metal ARM Programming Tutorial

Bare Metal ARM Programming Tutorial

(For developers curious about low-level embedded systems)



Step 1: Code Breakdown

1. Assembly Code (boot.s):

.global _start
_start:
    ldr r0, =message       @ Load message address into r0
    ldr r1, =0x09000000    @ QEMU "virt" machine's UART0 base address
1:                          @ Loop start
    ldrb r2, [r0], #1      @ Load byte from message, auto-increment address
    cmp r2, #0             @ Check for null terminator (end of string)
    beq 2f                 @ If null, exit loop
    str r2, [r1]           @ Write character to UART (serial output)
    b 1b                   @ Repeat loop
2:                          @ Halt
    wfi                    @ Wait for interrupt (pause CPU)
    b 2b                   @ Infinite loop (safety)
message:
    .asciz "Bare Metal ARM Boot OK\r\n"  @ Null-terminated string        


Key Concepts:

  • _start: Entry point for bare-metal programs (no OS).
  • UART (PL011): Serial communication hardware. Writing to its memory-mapped address (0x09000000 in QEMU) sends text to the console.
  • ldrb/str: Load-byte and store instructions for memory access.
  • wfi: Low-power state; required for clean execution in QEMU.


Step 2: Build & Run

1. Install Tools :

Linux :  Install  qemu-system-arm and gcc-arm-none-eabi 

Windows : Install Qemu + arm-gnu-toolchain-[version]-mingw-w64-i686-arm-none        

2. Compile/Link:

arm-none-eabi-as -mcpu=cortex-a7 boot.s -o boot.o          # Assemble
arm-none-eabi-ld -Ttext=0x40000000 -nostdlib boot.o -o boot.elf  # Link        

  • -Ttext=0x40000000: QEMU "virt" machine loads code at this address.
  • -nostdlib: No standard libraries (bare metal).

3. Run in QEMU:

qemu-system-arm -M virt -cpu cortex-a7 -kernel boot.elf -nographic -serial mon:stdio        

  • -M virt: Emulate ARM "virt" board (generic QEMU machine).
  • -nographic: Disable GUI; redirect UART to terminal.


Step 3: Expected Output

You’ll see:

Bare Metal ARM Boot OK        

(Press Ctrl+A then X to exit QEMU)


What you have learned:

  • Bare-Metal Programming: Direct hardware control without an OS. Essential for firmware, bootloaders, and embedded systems.
  • Memory-Mapped I/O: Hardware peripherals (like UART) are controlled by reading/writing specific memory addresses.
  • Cross-Compiling: Develop ARM code on x86 machines using specialized toolchains.


Troubleshooting:

  • Missing Tools: Verify arm-none-eabi-as and qemu-system-arm are installed.
  • Wrong UART Address: QEMU "virt" machine uses 0x09000000; real hardware differs.
  • Infinite Loop: The wfi + b 2b ensures QEMU doesn’t crash after execution.


Debugging

1. Modify Build Commands (Add Debug Symbols)

arm-none-eabi-as -g -mcpu=cortex-a7 boot.s -o boot.o  # -g adds debug symbols
arm-none-eabi-ld -Ttext=0x40000000 -nostdlib boot.o -o boot.elf        

2. Launch QEMU in Debug Mode

qemu-system-arm -M virt -cpu cortex-a7 -kernel boot.elf -nographic -serial mon:stdio -s -S        

  • -s: Start a GDB server on port 1234
  • -S: Freeze CPU at startup (wait for GDB to connect)


3. Debug with GDB (New Terminal)

arm-none-eabi-gdb boot.elf        

Inside GDB:

(gdb) target remote :1234    # Connect to QEMU
(gdb) break _start           # Set breakpoint at entry
(gdb) continue               # Start execution (will hit breakpoint)        

4. Inspect the String in Memory

Option 1: Check the Symbol Address

(gdb) print &message         # Get address of the string
# Example output: (int *) 0x40000028
(gdb) x/s 0x40000028         # Examine memory as string
# Output: "Bare Metal ARM Boot OK\r\n"        

Option 2: Follow Register r0

(gdb) stepi                  # Step through instructions
(gdb) info registers r0      # After ldr r0,=message executes
# r0 now holds the string address
(gdb) x/s $r0               # View string via register value        

5. Watch the String Get Sent to UART

(gdb) break *(_start+20)    # Break at "str r2, [r1]" instruction
(gdb) continue
# Each "continue" will print one character
(gdb) info registers r2     # See ASCII value of current character        


  1. Debug Symbols (-g flag): Allow GDB to map machine code to your original assembly labels/variables.
  2. QEMU GDB Stub (-s -S): Pauses execution and lets GDB control the virtual CPU.
  3. Memory Inspection: The .asciz directive stores the string in the program's binary. Its address is determined during linking (controlled by -Ttext=0x40000000).


Troubleshooting Tips

  • If x/s shows garbage: Check the address with x/16x [address] to verify raw bytes.
  • No debug symbols? Rebuild with -g and relink.
  • QEMU not responding? Use Ctrl+A C in QEMU's terminal to enter monitor mode, then type quit.


This workflow is essential for bare-metal development, where you often need to:

  1. Verify data is stored correctly in memory
  2. Confirm hardware register access works
  3. Catch infinite loops or incorrect memory addresses


Building a Simple CLI Interface

Let's create a complete bare-metal ARM CLI application combining assembly, C code, and proper linking. Here's the full implementation:

File Structure

.
├── boot.s            (Assembly entry point)
├── cli.c             (Main C logic)
└── linker.ld         (Linker script)        

1. boot.s (Assembly Entry Point)

.global _start

.section .text
_start:
    ldr sp, =0x40010000    @ Setup stack
    bl uart_init
    bl main
    b .

uart_init:
    ldr r0, =0x09000000     @ UART0 base
    mov r1, #0x00
    str r1, [r0, #0x30]     @ Disable UART
    mov r1, #0x10
    str r1, [r0, #0x2C]     @ 8N1 format
    mov r1, #26
    str r1, [r0, #0x24]     @ IBRD (115200 baud)
    mov r1, #3
    str r1, [r0, #0x28]     @ FBRD
    mov r1, #0x0301
    str r1, [r0, #0x30]     @ Enable UART
    bx lr
	        

2. cli.c (Main Program Logic)

#include <stdint.h>

#define UART_BASE ((volatile uint32_t*)0x09000000)
#define POWEROFF_BASE ((volatile uint32_t*)0x100000)

enum {CMDLEN = 64};

void uart_putc(char c) {
    while(UART_BASE[6] & (1 << 5));  /* Wait for TX ready*/
    UART_BASE[0] = c;
}

void uart_puts(const char *s) {
    while(*s) uart_putc(*s++);
}

void to_lower(char *str) {
    for(; *str; str++)
        if(*str >= 'A' && *str <= 'Z')
            *str += 32;
}

void uart_gets(char *buf) {
    int idx = 0;
    char c;
    while(1) {
        if(!(UART_BASE[6] & (1 << 4))) {  /*Check RXFE*/
            c = UART_BASE[0];
            if(c == '\r') {
                uart_puts("\r\n");
                break;
            }
            if(idx < CMDLEN-1) {
                buf[idx++] = c;
                uart_putc(c);  /* Echo back*/
            }
        }
    }
    buf[idx] = 0;
    to_lower(buf);  /* Make command lowercase*/
}

int strcmp_ci(const char *a, const char *b) {
    while(*a && *b) {
        if((*a|32) != (*b|32)) break;
        a++; b++;
    }
    return (*a|32) - (*b|32);
}

void show_status() {
    uart_puts("\r\nSystem Status:\r\n");
    uart_puts("Version: CLI v1.0\r\n");
    uart_puts("Uptime: 00:00:00\r\n");
    uart_puts("Memory: 64KB/64KB\r\n");
}

int main() {
    char input[CMDLEN];
    
    uart_puts("\r\nBooting...\r\n");
    uart_puts("System Ready\r\n");
    uart_puts("CLI v1.0 - Ready\r\n");

    while(1) {
        uart_puts("\r\n> ");
        uart_gets(input);
        
        if(strcmp_ci(input, "help") == 0) {
            uart_puts("Commands: help, status, cls, halt\r\n");
        }
        else if(strcmp_ci(input, "status") == 0) {
            show_status();
        }
        else if(strcmp_ci(input, "cls") == 0) {
            uart_puts("\x1B[2J\x1B[H");  /* ANSI clear*/
        }
        else if(strcmp_ci(input, "halt") == 0) {
            uart_puts("Shutting down...Please Press CTLR-A then X \r\n");
            *POWEROFF_BASE = 0x5555;  // Trigger QEMU shutdown
            while(1); // Halt the CPU indefinitely
        }
        else {
            uart_puts("Unknown command: ");
            uart_puts(input);
        }
    }
}        

3. linker.ld (Linker Script)

ENTRY(_start)

MEMORY {
    ROM (rx) : ORIGIN = 0x40000000, LENGTH = 64K
    RAM (rwx) : ORIGIN = 0x40010000, LENGTH = 64K
}

SECTIONS {
    .text : { *(.text*) } > ROM
    .rodata : { *(.rodata*) } > ROM
    .data : { *(.data*) } > RAM
    .bss : { *(.bss*) } > RAM
}        

Build & Run Instructions

  1. Compile Assembly

arm-none-eabi-gcc -mcpu=cortex-a7 -c -o boot.o boot.s        

  1. Compile C Code

arm-none-eabi-gcc -mcpu=cortex-a7 -ffreestanding -nostdlib -c -o cli.o cli.c        

  1. Link Objects

arm-none-eabi-gcc -T linker.ld -nostdlib -ffreestanding -o cli.elf boot.o cli.o        

  1. Run in QEMU

qemu-system-arm -M virt -cpu cortex-a7 -nographic -serial mon:stdio -kernel cli.elf        

Key Features

  1. Bare Metal Initialization

  • Stack pointer setup at 0x40010000
  • UART initialization for 115200 baud
  • Clean transition to C code

  1. CLI Functionality

  • Case-insensitive command parsing
  • ANSI clear screen support
  • System status display
  • Safe input buffering (63 chars max)
  • QEMU shutdown command

  1. Memory Layout

  • Code starts at 0x40000000 (QEMU virt machine entry)
  • 64KB ROM for code/constants
  • 64KB RAM for data/stack

Commands to Try

> help
> cls
> status
> halt        



Debugging Tips

  1. View memory layout:

arm-none-eabi-objdump -h cli.elf        

  1. Disassemble code:

arm-none-eabi-objdump -d cli.elf        

  1. Debug with QEMU+GDB:

qemu-system-arm -M virt -cpu cortex-a7 -s -S -nographic -serial mon:stdio -kernel cli.elf        

Then in another terminal:

arm-none-eabi-gdb cli.elf -ex "target remote :1234"        

This complete example demonstrates bare-metal development with proper hardware initialization, mixed assembly/C programming, and system-level interaction - all essential skills for embedded systems development!


Porting Bare-Metal ARM Code to Real Cortex-A7 Devices (e.g., POS Terminals)

To port the QEMU-based bare-metal CLI to a real Cortex-A7 device (like an old unused POS terminal, IoT device, or smartphone), this is not to encourage you to do so but not to try to do so !!!! this part is only to motivate you to learn more you’ll need to tackle these 9 critical challenges :

1. Hardware-Specific Initialization

Memory Map:

Real devices have fixed physical addresses for peripherals (UART, GPIO, etc.).

Example: A POS terminal’s UART might live at 0x01C28000 (Allwinner H5) vs. QEMU’s 0x09000000.

Requires: Chip datasheets or reverse-engineering via oscilloscope/UART logs.

Clock/Power Management:

Most SoCs require enabling clocks for peripherals.

Example: On a MediaTek MT6572, enabling UART clocks via CLK_CFG_1 register.

2. Boot Process Differences

Boot ROM:

Real devices execute vendor-specific boot ROM code before jumping to your code.

Solution: Chain-load your binary via:

Boot ROM → U-Boot → Your Code          

Or overwrite the boot partition (risky!). !!!!!!!!

DRAM Initialization:

QEMU pre-initializes RAM. Real devices need DDR training (vendor-specific sequences).

Example: Rockchip RK3288 requires 200+ register writes to initialize DRAM.

3. Peripheral Configuration

UART Setup:

Real UARTs (e.g., PL011, 8250) need baud rate, FIFO, and flow control setup.

Example code for Allwinner A10 UART:

ldr r0, =0x01C28000  @ UART0 base  
mov r1, #0x0D        @ Divisor Latch (115200 baud)  
str r1, [r0, #0x00]          

GPIO Multiplexing:

UART pins might be shared with other functions (e.g., SPI).

Example: On NXP i.MX6, set IOMUXC_SW_MUX_CTL_PAD_UART1_TX_DATA to enable UART TX.

4. Secure Boot & TrustZone

TrustZone (TZ):

ARM’s Secure World blocks bare-metal access to critical resources.

Workaround: Disable TZ via SCR register (if unlocked) .

Secure Boot:

POS terminals often enforce signed firmware.

5. Debugging on Real Hardware

JTAG/SWD:

Required for low-level debugging. Find test points on the PCB:

JTAG Pads

Tools: J-Link, OpenOCD, or Raspberry Pi Pico as a $5 JTAG adapter.

UART Fallback:

Capture boot logs via UART-to-USB converter (3.3V logic!).

6. Cross-Compiling for Real Targets

Adjust toolchain flags for the specific CPU variant:

arm-none-eabi-gcc -mcpu=cortex-a7 -mtune=cortex-a7 -mfloat-abi=hard

7. Device Tree Overrides

Many SoCs require a Device Tree Blob (DTB) to describe hardware. For bare metal:

Hardcode addresses in assembly/C.

Or parse DTB manually (advanced).

8. Power Management

Disable sleep states:

// Qualcomm example  
((volatile uint32_t)0xFC4B80BC) = 0x00000001; // Disable CPU idle          

9. Ethical & Legal Risks

Bricking: Flashing wrong firmware can permanently disable devices.

Legality: Reverse-engineering POS terminals may violate DMCA or vendor agreements.

Key Takeaways ===>

Start with dev boards: Raspberry Pi, STM32MP1.

Use JTAG: Critical for debugging.

Expect pain: Real hardware has undocumented quirks.


Disclaimer of Liability

This tutorial is provided for educational purposes only. The authors and publishers disclaim all responsibility for:

  • Device damage (bricking, hardware failure) from code porting attempts
  • Legal consequences arising from reverse engineering or unauthorized firmware modification
  • Security vulnerabilities introduced by experimental code

By following this guide, you agree to assume all risks associated with modifying hardware/software.


Ethical Considerations

Reverse engineering real devices often causes more harm than good:

Legal Risks

  • Violates DMCA (Digital Millennium Copyright Act) in many jurisdictions when bypassing DRM.
  • Breaches EULAs (End User License Agreements) for commercial devices.

Economic Harm

Companies invest millions in R&D; reverse engineering can undercut revenue and jeopardize jobs.

Security Fallout

Exposing hardware flaws without vendor coordination risks mass exploitation (e.g., payment terminals).


Responsible Learning Guidelines

  1. Use Dev Boards: Experiment on Raspberry Pi, STM32, or QEMU—not production devices.
  2. Respect IP: Never distribute proprietary code/firmware, even "for educational purposes."
  3. Report Responsibly: If you find vulnerabilities, follow coordinated disclosure via CERT/ICS-CERT.
  4. Avoid Piracy: Don’t use leaked SDKs/datasheets—seek vendor-authorized resources.

#BareMetalProgramming #ArmCortexA7 #QemuTutorial #EmbeddedSystems #ArmAssembly #LowLevelProgramming #UARTCommunication #BootloaderDevelopment #CrossCompiling #EmbeddedC #MemoryMappedIO #GnuToolchain #QemuEmulation #AnsiEscapeCodes #CLIInterface #HardwareAbstraction #GdbDebugging #LinkerScripts #Nostdlib #BareMetalBoot #ArmGcc #EmbeddedTutorial #FirmwareBasics #HelloWorldARM

#LearnEmbedded #EmbeddedDev #SystemsProgramming #TechTutorial #CodingChallenge #ProgrammingTips #ComputerArchitecture

#StaticLinking #StartupCode #UARTDriver #StringManipulation #CommandLineInterface #SystemInitialization #HardwareRegisters #InterruptHandling #StackPointerSetup #BareMetalCLI

#ArmNoneEabi #QemuSystemArm #Makefile #Gdb #ElfFormat #Armv7Architecture #BareMetalProject







要查看或添加评论,请登录

Hani Fahmi的更多文章