From Code to Kernel: Why is my "Hello World" so Big?
Table of Content
About This Series
This is the first chapter in the thirteen-chapter series on what happens after you run the program.
You can get more details on this mini-book here:
What am I doing?
Introduction
When beginning their journey with C programming on Linux, developers often start with the quintessential "Hello, World!" program. It's a rite of passage, a first step into the world of programming. However, this simple program holds a fascinating mystery that we'll unravel in this post: Why does such a tiny program compile into a surprisingly large executable?
Our Starting Point: The Simplest C Program
Let's begin with the classic "Hello, World!" program:
#include <stdio.h>
int main() {
printf("Hello, World!\n");
return 0;
}
Save this as hello.c and compile it with GCC:
gcc -o hello hello.c
Now, let's examine its size:
-rwxrwxr-x 1 chessman chessman 15960 Nov 7 13:16 hello
-rw-rw-r-- 1 chessman chessman 79 Nov 7 13:15 hello.c
15,969 bytes! That's shocking when you consider that our source code is merely 79 bytes. Let's put this in perspective:
Introduction to the ELF Format
Before we dive into the specifics, it's important to understand that our executable is in the ELF (Executable and Linkable Format) format, the standard binary format for executables on Linux. We'll explore ELF in great detail in Chapter 2, but for now, let's understand its basic structure.
An ELF file consists of several key components:
Let's use readelf to peek at the ELF header:
This header alone is 64 bytes! We'll explore these fields in detail in Chapter 2, "ELF: Demystifying the Executable Format."
Executable Files: Not Just Your Code
An executable file on Linux is not merely a raw dump of your compiled C code. Instead, it's a meticulously organized structure containing various segments of information crucial for the operating system to load and execute your program.
These segments serve diverse purposes:
Examining the Sections
Let's use objdump to look at the sections in our executable:
That's a lot of sections! Let's break down the most important ones and understand why they're necessary:
Essential Code Sections
.text Section (The Code)
The .text section contains the actual machine code. Notice several interesting points:
We'll explore the details of code sections more thoroughly in Chapter 3, "Where Your C Code Lives: Understanding ELF Sections."
.rodata Section (Read-only Data)
领英推荐
This section contains our string constant "Hello, World!" along with other read-only data. The string is null-terminated and aligned according to the system's requirements.
Dynamic Linking Infrastructure
Our executable needs several sections to support dynamic linking:
.interp Section
? executables git:(main) ? readelf -p .interp hello
String dump of section '.interp':
[ 0] /lib64/ld-linux-x86-64.so.2
This section specifies the dynamic linker that will load our program. We'll explore dynamic linking in detail in Chapter 9, "Dynamic Linking in C: Shrinking Executables and Sharing Code."
Dynamic Symbol Sections
These sections (.dynsym, .dynstr) contain information about functions we use from shared libraries. The symbol table's role will be covered extensively in Chapter 7, "Symbols: The Linker's Address Book."
Runtime Support Sections
Initialization and Finalization
? executables git:(main) ? readelf -d hello | grep INIT
0x000000000000000c (INIT) 0x1000
0x0000000000000019 (INIT_ARRAY) 0x3db8
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
These sections (.init, .init_array, .fini, .fini_array) handle program initialization and cleanup. We'll explore how these sections work before main() is called in Chapter 4, "Before main(): The Secret Life of Global Variables in C."
Exception Handling Support
? executables git:(main) ? readelf -w hello | grep -A2 ".eh_frame"
[17] .eh_frame_hdr PROGBITS 0000000000002014 00002014
0000000000000044 0000000000000000 A 0 0 4
[Containing entries for all functions]
The .eh_frame and .eh_frame_hdr sections support C++ exceptions and stack unwinding. While our simple C program doesn't use exceptions, these sections are included to support interoperability with C++ code and for proper stack traces during crashes.
Understanding the Size Contributors
Let's break down where all those bytes go:
? executables git:(main) ? size --format=GNU hello
text data bss total filename
367 1609 8 1984 hello
But this only tells part of the story. Let's get a more detailed view:
Can We Make It Smaller?
Yes! Let's try some optimization techniques:
Basic Size Optimization
gcc -Os -o hello_small hello.c
strip hello_small
ls -l hello_small
-rwxrwxr-x 1 chessman chessman 14472 Nov 7 13:35 hello_small
The -Os flag optimizes for size, and strip removes debugging information.
Static Linking (for comparison)
? executables git:(main) ? gcc -static -o hello_static hello.c
? executables git:(main) ? ls -l hello_static
-rwxrwxr-x 1 chessman chessman 900344 Nov 7 13:37 hello_static
Static linking makes our executable much larger because it includes all library code directly! We'll explore the trade-offs between static and dynamic linking in Chapter 9.
Advanced Optimization (preview)
? executables git:(main) ? gcc -Os -fdata-sections -ffunction-sections -Wl,--gc-sections -o hello_opt hello.c
? executables git:(main) ? strip hello_opt
? executables git:(main) ? ls -l hello_opt
-rwxrwxr-x 1 chessman chessman 14464 Nov 7 13:38 hello_opt
This uses link-time optimization to remove unused sections. We'll explore these techniques in Chapter 8, "Customizing the Layout: Introduction to Linker Scripts."
Why Keep All This "Overhead"?
While our executable might seem bloated, each component serves crucial purposes:
Conclusion
Our journey through the "Hello, World!" program has revealed that modern executables are sophisticated containers that package not just our code, but also the infrastructure needed to:
In the upcoming chapters, we'll dive deeper into each of these aspects:
Understanding these concepts empowers us to:
Further Reading