Understanding Object Files

Understanding Object Files

In my previous article I demonstrated how programs are built in the C programming language. Click here to read the full article. In this article, I will walk you through the object files.

What is an Object File

An object file is a file generated as an intermediate output during the compilation process in programming, especially in languages such as C or C++. It contains machine code that has been translated from source code by a compiler like GCC but is not yet a complete, standalone executable program. In the literature, you will find different types of object files:

  1. Relocatable Object Files (.o)
  2. Executable Object Files (.out, .elf)
  3. Shared Object Files (.so)

Object files are generally not meant to be run directly. Instead, they are used as building blocks by the linker, which combines them with other object files and libraries to produce a final executable or library.

Consider below hello.c program.

#include <stdio.h>

// my first C-program

#define PI 3.142

int main () {

        printf("Hello World\n");
        printf("The value of pi is: %f",PI);
        return 0;
}
        

When you compile hello.c code with -save-temps, it will generate output files for all intermediate stages i.e. preprocessing, compiling, assembling, and linking.

$ gcc hello.c -o hello  -save-temps        
The -save-temps option in GCC (GNU Compiler Collection) instructs the compiler to preserve intermediate files generated during the compilation process, which are normally deleted after compilation. These files can be helpful for debugging or inspecting the steps involved in the compilation process.
$ ls -l
total 48
-rwxr-xr-x 1 root root 16000 Sep 24 11:01 hello
-rw-r--r-- 1 root root   153 Sep 24 10:58 hello.c
-rw-r--r-- 1 root root 18004 Sep 24 11:01 hello.i
-rw-r--r-- 1 root root  1672 Sep 24 11:01 hello.o
-rw-r--r-- 1 root root   887 Sep 24 11:01 hello.s
        

In the ls command output you will notice the intermediate files generated during the compilation process. Our focus in this article will be on hello.o and hello file. hello.o is basically a relocatable object file where as hello is an executable object file.


Now, we will use Linux file command to analyze the executable object file 'hello'.

$ file hello
hello: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=dc3f647f6a02e2af3424db87aa2bd79aa277226b, for GNU/Linux 3.2.0, not stripped
        

Let's break down each part of the output:

1. ELF 64-bit

ELF: The file is in the Executable and Linkable Format (ELF), which is the standard format for executables, object code, shared libraries, and core dumps on Unix-like systems (including Linux). There are some other executable formats as well for example .coff (Common Object File Format), .pe (Portable Executable) format.

64-bit: This indicates that the executable is compiled for a 64-bit architecture, meaning it can use 64-bit memory addresses and registers.

2. LSB (Little-Endian)

LSB: LSB for Least Significant Byte first, indicating that the file uses little-endian byte order. In little-endian systems, the least significant byte (LSB) of a number is stored at the smallest memory address. This is typical of x86 and x86-64 architectures.

3. PIE (Position Independent Executable)

PIE: A Position Independent Executable is an executable file that can be loaded at any memory address. This is an important security feature, as it allows Address Space Layout Randomization (ASLR), making it harder for attackers to predict where the program's code will be loaded in memory.

4. x86-64

This specifies the architecture for which the executable is built. In this case, it's built for the x86-64 architecture, which is the 64-bit version of the x86 instruction set used in modern computers.

5. Version 1 (SYSV)

This refers to the System V ABI (Application Binary Interface) version used by the executable. System V is a standard ABI used in Unix-like operating systems, ensuring that binaries compiled on one Unix-like system can run on another that supports the same ABI.

6. Dynamically Linked

The executable is dynamically linked, meaning it relies on external shared libraries (such as the C standard library) at runtime. These shared libraries are not embedded within the executable but are loaded into memory when the program is executed.

7. Interpreter: /lib64/ld-linux-x86-64.so.2

This indicates the interpreter or loader that will be used to load and run the executable. In this case, it points to the standard Linux 64-bit loader, /lib64/ld-linux-x86-64.so.2, which is responsible for loading shared libraries and executing the program.

8. BuildID[sha1]=dc3f647f6a02e2af3424db87aa2bd79aa277226b

This is a Build ID for the executable, which is a unique identifier (in this case, a SHA-1 hash). It helps track specific builds of the executable, useful for debugging, tracing, and managing software packages.

9. For GNU/Linux 3.2.0

The executable is designed to run on GNU/Linux, and it is compatible with version 3.2.0 of the Linux kernel or later. This indicates the minimum kernel version required for the executable to run properly.

10. Not Stripped

This means the executable has not been stripped of its symbol table and debugging information. The symbol table contains metadata about function names, variable names, and other information that can be useful for debugging. Stripped executables are smaller but harder to debug, as they lack this extra information.


Now, lets examine 'hello.o' relocatable object file.

acp ~ $ file hello.o
hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
        

Let's break down each part of the output:

  1. hello.o: This is the name of the file being analyzed, in this case, an object file (.o extension) created during the compilation process.
  2. ELF: The file is in the Executable and Linkable Format (ELF). Same as discussed above.
  3. 64-bit: The object file is compiled for a 64-bit architecture. This means it will use 64-bit memory addresses and registers.
  4. LSB (Little Endian): LSB stands for Least Significant Byte first, Same as discussed above.
  5. Relocatable: This is an important part of the description. The file is a relocatable object file, meaning it contains code and data that are not yet fully linked. In a relocatable object file, certain addresses and references (like function calls or memory accesses) are still unresolved because they depend on other object files or libraries that will be linked later. The linker will process this file, resolve the addresses, and create a fully linked executable or library. Relocatable files can be combined with other object files or libraries to form the final executable.
  6. x86-64: Same as discussed above.
  7. Version 1 (SYSV): Same as discussed above.
  8. Not Stripped: Same as discussed above.


Summary

To sum up, understanding object files is essential for understanding C and C++ compilation process. These files—relocatable, executable, or shared—are vital to turning human-readable source code into machine-executable programs. By examining intermediate files like hello.o and executable formats like ELF we can imagine the complexity associated with the compilation system. Programmers and system developers require this core knowledge to debug, optimize, and secure applications.

要查看或添加评论,请登录

ANJUM NAZIR的更多文章

社区洞察

其他会员也浏览了