ELF Files: the Basics
Legolas, an actual elf from The Lord of the Rings film adaptations (? New Line Cinema / J.R.R. Tolkien)

ELF Files: the Basics

When it comes to beginning to understand systems programming and reverse engineering, especially as it applies to the Linux/Unix-like sphere, deconstructing and studying data from files of a format known as ELF allows a valuable glimpse into their inner workings and structures. By grasping the basics of the essence of ELF in addition to the use of a variety of very useful tools designed to facilitate working with that format, perusing the contents of an ELF file can become quite simple.

What is ELF?

ELF, an acronym for Executable and Linkable Format, is a common standard file format for executable files, object code, and shared libraries used for storing binaries, libraries, and core dumps in Linux and other Unix-like operating systems and is very versatile which allows it to be executed on various processor types. What clearly distinguishes an ELF file is the presence of the first sixteen bytes within the file known as the magic number used to identify the file, the significant first four of which, '7f 45 4c 46' in hexadecimal, translate to '. E L F' when parsed or viewed from a hex-edit utility.

Uses for ELF

ELF offers several advantages, making it the preferred format for executables and object files - it provides a uniform format for various types of binary files and simplifies their execution and linking process across different platforms. Additionally, it supports dynamic linking which allows multiple programs to share common libraries efficiently.

Information Stored in ELF

ELF files typically contain a broad spectrum of information within its structure, some key components of which include:

  • ELF file header: Contains essential information about the file, such as the architecture, entry point, endianness, program header offset, and section header offset.
  • Section header: Describes the sections within the file, such as code sections, data sections, symbol tables, string tables, etc.
  • Program header: Defines program segments of the executable, including executable code, initialized data, and read-only data.
  • Symbol tables: Contains information about symbols present in the file such as function names, variable names, memory addresses, and attributes.
  • Debugging information: When a program is compiled using the flag corresponding to adding debug symbols (e.g. '-g' in gcc/g++), contains these symbols, enabling debugging tools to extract information about the program's source code, variables, and program state.
  • Dynamic linking information: If present, holds details for dynamically linking the executable file with shared libraries.

How Information is Stored in ELF

The ELF format employs a flexible structure with headers and tables that neatly organize the stored information using fixed and variable-size sections to accommodate various types of data which can be either 32-bit or 64-bit and either little endian or big endian, depending on the target architecture. The inner structure consists of multiple sections that each have their own header and are arranged according to their specific purposes, allowing easy access and interpretation of the contained data.

Parsing ELF File Information

Parsing information in ELF requires having deeper understanding of the structure of the file and employing appropriate tools and techniques in order to extract the desired information. The elf.h header file available in the C standard library contains several predefined types, data structs, macros, and helpful comments that aid immensely in collecting the relevant data and purposing it effectively.

Useful Commands for ELF Processing

readelf

readelf displays ELF file information from one or more input files and can show the ELF file header, section headers, and program headers, among others. Below are just a few flags for usage with readelf:

  • -h : display file headers.
  • -S : display section headers.
  • -l : display program headers.

Example output using the '-h' flag on a sample 64-bit ELF file:

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400600
  Start of program headers:          64 (bytes into file)
  Start of section headers:          6936 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         31
  Section header string table index: 28        

nm

nm may be used to display symbols from the symbol table of an ELF file that usually includes a symbol's memory address within the file, its type, and its name - as stated previously, these symbols encompass the various function, variable, and attribute names present in an executable or object file.

Example output using the '-p' flag on a sample 32-bit file:

0804858c t _strrchr
080487c4 t gcc2_compiled.
         U __syscall
080486cc W dlerror
08049988 B __mainprog_obj
08048700 T _dladdr
08049900 A _DYNAMIC
08048660 T _dlclose
080487c4 A _etext
08048698 T _dlsym
08048700 W dladdr
08048660 W dlclose
08048440 T _init
0804862c T _dlopen
0804998c B environ
080484c0 T __start
080486cc T _dlerror
080498a8 D __progname
080484c0 T _start
0804862c W dlopen
08048698 W dlsym
08049988 A __bss_start
08049990 B main
080487c4 T _fini
         U atexit
080484d8 T ___start
08049988 A _edata
080498cc A _GLOBAL_OFFSET_TABLE_
08049994 A _end
080498ac D __ps_strings
         U exit
080485b8 T _rtld_setup        

objdump

objdump is a handy command for displaying various information within sections of executable or object files and may often be used as a disassembler in order to view an executable in assembly form. One very interesting use of this command with the '-d' or '-D' flags is dumping a disassembled version of a file's machine code which can be useful for inspecting the 'internals' of a program when its source may not be readily available.

Example output portion using the '-d' flag:

[...]
0000000000003cd4 <separator>:
    3cd4:	55                   	push   %rbp
    3cd5:	48 89 e5             	mov    %rsp,%rbp
    3cd8:	48 83 ec 20          	sub    $0x20,%rsp
    3cdc:	48 89 7d e8          	mov    %rdi,-0x18(%rbp)
    3ce0:	48 89 75 e0          	mov    %rsi,-0x20(%rbp)
    3ce4:	48 c7 45 f8 00 00 00 	movq   $0x0,-0x8(%rbp)
    3ceb:	00 
    3cec:	90                   	nop
    3ced:	48 8b 55 e0          	mov    -0x20(%rbp),%rdx
    3cf1:	48 8b 45 e8          	mov    -0x18(%rbp),%rax
    3cf5:	48 89 d6             	mov    %rdx,%rsi
    3cf8:	48 89 c7             	mov    %rax,%rdi
    3cfb:	e8 1c 00 00 00       	call   3d1c <_strsep>
    3d00:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
    3d04:	48 83 7d f8 00       	cmpq   $0x0,-0x8(%rbp)
    3d09:	74 0b                	je     3d16 <separator+0x42>
    3d0b:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
    3d0f:	0f b6 00             	movzbl (%rax),%eax
    3d12:	84 c0                	test   %al,%al
    3d14:	74 d7                	je     3ced <separator+0x19>
    3d16:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
    3d1a:	c9                   	leave
    3d1b:	c3                   	ret

0000000000003d1c <_strsep>:
    3d1c:	55                   	push   %rbp
    3d1d:	48 89 e5             	mov    %rsp,%rbp
    3d20:	48 83 ec 20          	sub    $0x20,%rsp
    3d24:	48 89 7d e8          	mov    %rdi,-0x18(%rbp)
    3d28:	48 89 75 e0          	mov    %rsi,-0x20(%rbp)
[...]        

Conclusion

Understanding the ELF format and how to parse it is essential knowledge for systems programming and reverse engineering on Linux and Unix-like operating systems as it provides an interface for collecting and analyzing data and attributes of executable and object files. With the determined use of this knowledge and the tools designed around it, breaking down and analyzing this data becomes simplified.

Thank you for reading!

要查看或添加评论,请登录

Sam Ansari的更多文章

  • Brief Introduction to Templates in C++

    Brief Introduction to Templates in C++

    C++ is what is known as a strongly-typed programming language since all variables, whether declared in the written code…

  • Process Communication: Understanding Signals

    Process Communication: Understanding Signals

    In the sphere of programs, signals play a key role in defining their behavior and faciliting communication between…

  • Healthy Ergonomics - Healthy Workplace

    Healthy Ergonomics - Healthy Workplace

    Ergonomics, in essence, is the practice of designing environments, tools, and systems to optimize human performance and…

    1 条评论
  • Virtual Memory: the /proc Filesystem

    Virtual Memory: the /proc Filesystem

    Anyone who has worked with or studied UNIX-like operating systems in depth has likely heard the saying that "everything…

  • Python: Objects of Obsession

    Python: Objects of Obsession

    Programming languages are classified based on their characteristics and the manner in which they collect and process…

社区洞察

其他会员也浏览了