ELF Files: the Basics
When it comes to beginning to understand systems programming and reverse engineering, especially as it applies to the Linux/Unix-like sphere, deconstructing and studying data from files of a format known as ELF allows a valuable glimpse into their inner workings and structures. By grasping the basics of the essence of ELF in addition to the use of a variety of very useful tools designed to facilitate working with that format, perusing the contents of an ELF file can become quite simple.
What is ELF?
ELF, an acronym for Executable and Linkable Format, is a common standard file format for executable files, object code, and shared libraries used for storing binaries, libraries, and core dumps in Linux and other Unix-like operating systems and is very versatile which allows it to be executed on various processor types. What clearly distinguishes an ELF file is the presence of the first sixteen bytes within the file known as the magic number used to identify the file, the significant first four of which, '7f 45 4c 46' in hexadecimal, translate to '. E L F' when parsed or viewed from a hex-edit utility.
Uses for ELF
ELF offers several advantages, making it the preferred format for executables and object files - it provides a uniform format for various types of binary files and simplifies their execution and linking process across different platforms. Additionally, it supports dynamic linking which allows multiple programs to share common libraries efficiently.
Information Stored in ELF
ELF files typically contain a broad spectrum of information within its structure, some key components of which include:
How Information is Stored in ELF
The ELF format employs a flexible structure with headers and tables that neatly organize the stored information using fixed and variable-size sections to accommodate various types of data which can be either 32-bit or 64-bit and either little endian or big endian, depending on the target architecture. The inner structure consists of multiple sections that each have their own header and are arranged according to their specific purposes, allowing easy access and interpretation of the contained data.
Parsing ELF File Information
Parsing information in ELF requires having deeper understanding of the structure of the file and employing appropriate tools and techniques in order to extract the desired information. The elf.h header file available in the C standard library contains several predefined types, data structs, macros, and helpful comments that aid immensely in collecting the relevant data and purposing it effectively.
Useful Commands for ELF Processing
readelf
readelf displays ELF file information from one or more input files and can show the ELF file header, section headers, and program headers, among others. Below are just a few flags for usage with readelf:
领英推荐
Example output using the '-h' flag on a sample 64-bit ELF file:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x400600
Start of program headers: 64 (bytes into file)
Start of section headers: 6936 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 9
Size of section headers: 64 (bytes)
Number of section headers: 31
Section header string table index: 28
nm
nm may be used to display symbols from the symbol table of an ELF file that usually includes a symbol's memory address within the file, its type, and its name - as stated previously, these symbols encompass the various function, variable, and attribute names present in an executable or object file.
Example output using the '-p' flag on a sample 32-bit file:
0804858c t _strrchr
080487c4 t gcc2_compiled.
U __syscall
080486cc W dlerror
08049988 B __mainprog_obj
08048700 T _dladdr
08049900 A _DYNAMIC
08048660 T _dlclose
080487c4 A _etext
08048698 T _dlsym
08048700 W dladdr
08048660 W dlclose
08048440 T _init
0804862c T _dlopen
0804998c B environ
080484c0 T __start
080486cc T _dlerror
080498a8 D __progname
080484c0 T _start
0804862c W dlopen
08048698 W dlsym
08049988 A __bss_start
08049990 B main
080487c4 T _fini
U atexit
080484d8 T ___start
08049988 A _edata
080498cc A _GLOBAL_OFFSET_TABLE_
08049994 A _end
080498ac D __ps_strings
U exit
080485b8 T _rtld_setup
objdump
objdump is a handy command for displaying various information within sections of executable or object files and may often be used as a disassembler in order to view an executable in assembly form. One very interesting use of this command with the '-d' or '-D' flags is dumping a disassembled version of a file's machine code which can be useful for inspecting the 'internals' of a program when its source may not be readily available.
Example output portion using the '-d' flag:
[...]
0000000000003cd4 <separator>:
3cd4: 55 push %rbp
3cd5: 48 89 e5 mov %rsp,%rbp
3cd8: 48 83 ec 20 sub $0x20,%rsp
3cdc: 48 89 7d e8 mov %rdi,-0x18(%rbp)
3ce0: 48 89 75 e0 mov %rsi,-0x20(%rbp)
3ce4: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
3ceb: 00
3cec: 90 nop
3ced: 48 8b 55 e0 mov -0x20(%rbp),%rdx
3cf1: 48 8b 45 e8 mov -0x18(%rbp),%rax
3cf5: 48 89 d6 mov %rdx,%rsi
3cf8: 48 89 c7 mov %rax,%rdi
3cfb: e8 1c 00 00 00 call 3d1c <_strsep>
3d00: 48 89 45 f8 mov %rax,-0x8(%rbp)
3d04: 48 83 7d f8 00 cmpq $0x0,-0x8(%rbp)
3d09: 74 0b je 3d16 <separator+0x42>
3d0b: 48 8b 45 f8 mov -0x8(%rbp),%rax
3d0f: 0f b6 00 movzbl (%rax),%eax
3d12: 84 c0 test %al,%al
3d14: 74 d7 je 3ced <separator+0x19>
3d16: 48 8b 45 f8 mov -0x8(%rbp),%rax
3d1a: c9 leave
3d1b: c3 ret
0000000000003d1c <_strsep>:
3d1c: 55 push %rbp
3d1d: 48 89 e5 mov %rsp,%rbp
3d20: 48 83 ec 20 sub $0x20,%rsp
3d24: 48 89 7d e8 mov %rdi,-0x18(%rbp)
3d28: 48 89 75 e0 mov %rsi,-0x20(%rbp)
[...]
Conclusion
Understanding the ELF format and how to parse it is essential knowledge for systems programming and reverse engineering on Linux and Unix-like operating systems as it provides an interface for collecting and analyzing data and attributes of executable and object files. With the determined use of this knowledge and the tools designed around it, breaking down and analyzing this data becomes simplified.
Thank you for reading!