Understanding Object Files
In my previous article I demonstrated how programs are built in the C programming language. Click here to read the full article. In this article, I will walk you through the object files.
What is an Object File
An object file is a file generated as an intermediate output during the compilation process in programming, especially in languages such as C or C++. It contains machine code that has been translated from source code by a compiler like GCC but is not yet a complete, standalone executable program. In the literature, you will find different types of object files:
Object files are generally not meant to be run directly. Instead, they are used as building blocks by the linker, which combines them with other object files and libraries to produce a final executable or library.
Consider below hello.c program.
#include <stdio.h>
// my first C-program
#define PI 3.142
int main () {
printf("Hello World\n");
printf("The value of pi is: %f",PI);
return 0;
}
When you compile hello.c code with -save-temps, it will generate output files for all intermediate stages i.e. preprocessing, compiling, assembling, and linking.
$ gcc hello.c -o hello -save-temps
The -save-temps option in GCC (GNU Compiler Collection) instructs the compiler to preserve intermediate files generated during the compilation process, which are normally deleted after compilation. These files can be helpful for debugging or inspecting the steps involved in the compilation process.
$ ls -l
total 48
-rwxr-xr-x 1 root root 16000 Sep 24 11:01 hello
-rw-r--r-- 1 root root 153 Sep 24 10:58 hello.c
-rw-r--r-- 1 root root 18004 Sep 24 11:01 hello.i
-rw-r--r-- 1 root root 1672 Sep 24 11:01 hello.o
-rw-r--r-- 1 root root 887 Sep 24 11:01 hello.s
In the ls command output you will notice the intermediate files generated during the compilation process. Our focus in this article will be on hello.o and hello file. hello.o is basically a relocatable object file where as hello is an executable object file.
Now, we will use Linux file command to analyze the executable object file 'hello'.
$ file hello
hello: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=dc3f647f6a02e2af3424db87aa2bd79aa277226b, for GNU/Linux 3.2.0, not stripped
Let's break down each part of the output:
1. ELF 64-bit
ELF: The file is in the Executable and Linkable Format (ELF), which is the standard format for executables, object code, shared libraries, and core dumps on Unix-like systems (including Linux). There are some other executable formats as well for example .coff (Common Object File Format), .pe (Portable Executable) format.
64-bit: This indicates that the executable is compiled for a 64-bit architecture, meaning it can use 64-bit memory addresses and registers.
2. LSB (Little-Endian)
LSB: LSB for Least Significant Byte first, indicating that the file uses little-endian byte order. In little-endian systems, the least significant byte (LSB) of a number is stored at the smallest memory address. This is typical of x86 and x86-64 architectures.
领英推荐
3. PIE (Position Independent Executable)
PIE: A Position Independent Executable is an executable file that can be loaded at any memory address. This is an important security feature, as it allows Address Space Layout Randomization (ASLR), making it harder for attackers to predict where the program's code will be loaded in memory.
4. x86-64
This specifies the architecture for which the executable is built. In this case, it's built for the x86-64 architecture, which is the 64-bit version of the x86 instruction set used in modern computers.
5. Version 1 (SYSV)
This refers to the System V ABI (Application Binary Interface) version used by the executable. System V is a standard ABI used in Unix-like operating systems, ensuring that binaries compiled on one Unix-like system can run on another that supports the same ABI.
6. Dynamically Linked
The executable is dynamically linked, meaning it relies on external shared libraries (such as the C standard library) at runtime. These shared libraries are not embedded within the executable but are loaded into memory when the program is executed.
7. Interpreter: /lib64/ld-linux-x86-64.so.2
This indicates the interpreter or loader that will be used to load and run the executable. In this case, it points to the standard Linux 64-bit loader, /lib64/ld-linux-x86-64.so.2, which is responsible for loading shared libraries and executing the program.
8. BuildID[sha1]=dc3f647f6a02e2af3424db87aa2bd79aa277226b
This is a Build ID for the executable, which is a unique identifier (in this case, a SHA-1 hash). It helps track specific builds of the executable, useful for debugging, tracing, and managing software packages.
9. For GNU/Linux 3.2.0
The executable is designed to run on GNU/Linux, and it is compatible with version 3.2.0 of the Linux kernel or later. This indicates the minimum kernel version required for the executable to run properly.
10. Not Stripped
This means the executable has not been stripped of its symbol table and debugging information. The symbol table contains metadata about function names, variable names, and other information that can be useful for debugging. Stripped executables are smaller but harder to debug, as they lack this extra information.
Now, lets examine 'hello.o' relocatable object file.
acp ~ $ file hello.o
hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
Let's break down each part of the output:
Summary
To sum up, understanding object files is essential for understanding C and C++ compilation process. These files—relocatable, executable, or shared—are vital to turning human-readable source code into machine-executable programs. By examining intermediate files like hello.o and executable formats like ELF we can imagine the complexity associated with the compilation system. Programmers and system developers require this core knowledge to debug, optimize, and secure applications.