Introduction to ELF
What is ELF
We are not talking about Christmas ELF, here we are talking about the Unix ELF, Executable and Linkable Format (ELF) is a standard in UNIX to define:
- Executable files (Binary file with instruction to be performed by the CPU)
- Object code (Sequence of instructions in machine language)
- Shared libraries (Instructions to be shared between executable files)
- Core dumps (Recorded state of the working memory of a computer program at a specific time)
Why it is used
A standard format is used so that between a different number of machines with different characteristics can still be understood, the ELF format offers flexibility, extensibility, and cross-platform support for divergent endian formats and address sizes.
a.out Was used in Linux before ELF, but ELF’s design is not limited to a specific processor, instruction set, or hardware architecture. The a.out file format on Linux was deprecated with the release of the 5.1 Linux kernel.
What information is stored in it
Let's create a simple hello world program
gcc main.c -o hello_world
This file hello_world will have multiple sections and segments, and additional fields that allows our code to be executed
Executing and Linking ELF files
Note that only the Only the ELF header has a fixed position in any ELF file.
When we are linking (creating the executable) only the sections part of the ELF file is important since there is information about instructions, data, symbol table, relocation information and other parts of the program.
When we are executing, we only care about the segments of the ELF file that give us all the information, the system needs to prepare the program for execution, memory addresses, permissions, VRAM addresses, and data.
ELF header
We can get information about the ELF file using the header by using readelf:
readelf -h hello_world
The ELF header defines whether the file is designed to use 32-bit or 64-bit addresses. The header contains three fields that are affected by this setting and offset other fields that follow them.
The ELF header is 52 or 64 bytes long for 32-bit and 64-bit binaries respectively, since the ELF header is the only fixed part of the ELF it gives the information needed to access the sections and segments inside the file.
Section header
readelf -S hello_world
Section header table, describing zero or more sections, Various sections hold program and control information, sections are needed on links but they are not needed on runtime, you can delete the sections from your working ELF file and it will still be able to run the software.
Program header
readelf -l hello_world
The program header table describes zero or more memory segments, and how has the memory to be mapped in order to execute the program. Tells the system how to create a process image. It is found at file offset e_phoff, and consists of e_phnum entries, each with size e_phentsize.
The layout is slightly different in 32-bit ELF vs 64-bit ELF, because the p_flags are in a different structure location for alignment reasons. Each entry is structured as:
How this information is stored
All the information inside the file is stored in the form of bytes, those bytes can be understood according to the following headers:
ELF header, Section header and format header for a 64 bit architecture
Each ELF file is composed by bytes of information, this information is parsed using the structures defined by the headers, some commun segments are:
- .text: code.
- .data: initialised data.
- .rodata: initialised read-only data.
- .bss: uninitialized data.
- .plt: PLT (Procedure Linkage Table) (IAT equivalent).
- .got: GOT entries are dedicated to dynamically linked global variables.
- .got.plt: GOT entries dedicated to dynamically linked functions.
- .symtab: global symbol table.
- .dynamic: Holds all needed information for dynamic linking.
- .dynsym: symbol tables dedicated to dynamically linked symbols.
- .strtab: string table of .symtab section.
- .dynstr: string table of .dynsym section.
- .interp: RTLD embedded string.
- .rel.dyn: global variable relocation table.
- .rel.plt: function relocation table.
How to parse this information
Custom code can read the information about an ELF file taking into account the endianess, size of the addresses and every data inside the ELF file can be read, of course, there already exist solutions to obtain information about ELF files like nm, the objdump, and readelf.
The readelf command
The readelf command displays information about ELF files, we can indicate about which part of the code we want to obtain more information, This program performs a similar function to objdump but it goes into more detail and it exists independently of the BFD library, the base code of the readelf command uses the "elf.h" header to parse byte by byte.
The nm command
Inside the .symtab region of the ELF format, there's information about each segment that the code needs to be executed in the computer, with nm command we can have more information about those segments, displaying the lists of symbols from object files, if no object files are listed as arguments.
The objdump command
objdump displays information about one or more object files
ELF format allows flexibility since everything you need to understand the content of the file can be explained in the header, as we can see we can check for data in the read-only memory of our hello world program in the initialized read-only data section (rodata) of our ELF file.
ELF format is pretty popular so having a basic understanding of how the information is parsed and what's stored in the file, will help you debug easily your code.