How to Compile a C Program
So you've written your first C Program, which is undoubtedly "Hello, World". Now you want to run the program so that you can show off your hard work to all your friends. How do you go about running a C Program? First, there are a few steps to take before your computer can effectively print "Hello, World" to your terminal.
What is gcc?
You're sitting there with your source file named hello.c and wondering how to get to the desired result of "Hello, World". The mechanism for doing this is running your code through a compiler, which for us in this instance is gcc. Gcc stands for GNU Compiler Collection as it was originally built as the compiler for GNU, a free operating system. It now operates as a compiler for a wide variety of languages, including C.
How does gcc work?
Gcc runs your source code (hello.c) through a series of steps to turn it into an executable file. These steps include: Preprocessing, Compiling, Assembling, and Linking. A short synopsis of this process would be that preprocessing adds in all the external dependencies and converts all your macros into their subsequent values and replaces them in your code. Compiling then turns your code into assembly language, which is the language that the assembler understands. The assembler takes the compiled code in assembly language and converts it into machine code in the form of binary, the only language computers actually understand. The machine code is placed into object files, and it's then sent to the final step of linking. Linking bundles together all the object files you've told it to use, so you have one cohesive program that becomes an executable.
Step 1 - Preprocessing
The act of preprocessing can be likened to a translation. Anything with a # preceding it is seen as a preprocessing directive. It looks at all the header files you've included, and inserts them into your source file, essentially grabbing the code you ask for with #include <stdio.h> and putting it at the top of your source file. All your macros (denoted by #define) are expanded into their respective values and placed into your code. It also strips away all your comments as these are meant only for other humans perusing your code, and they have no meaning to the machine itself.
In the above example, we see the header file <stdio.h> included, as well as the macro AGE defined with the value of 28. This is what our source code file looks like before we've begun the compilation process. You can see we have 2 preprocessor directives, 1 include and 1 define. Now let's take a look at what happens when we run our hello.c file through the preprocessor. We can have the gcc complier stop at the preprocessing stage by adding the -E option, as shown in the example below.
By stopping the compilation process after the preprocessing stage, we can view the output of this step and see what gets sent to the compilation stage. An example of preprocessed output is shown below.
This is the beginning of the preprocessed file's output, which was stored in a file called "hello.i". This .i extension is the computer's way of saying this is a preprocessed file. In the image above, the preprocessor is figuring out how to resolve all the dependencies by determining if it has the dependencies or needs to fetch them externally.
The final lines of the preprocessed output shows us the preprocessor grabbing the external dependencies it lacks, denoted by the extern keyword. It then shows us that our comments have been stripped away and we're left with whitespace in their place. Finally, we see that the macro we defined as AGE has been replaced with its respective value of 28. Now that we've seen what happens at preprocessing, let's move on to compilation.
Step 2 - Compiling
The next step is to take our preprocessed file and move it into the compiling stage. Here, the machine will take the "hello.i" file and transform it into a filed called "hello.s". This .s extension signifies to the computer that file is in assembly level language. Assembly language is the language the assembler understands and looks slightly more arcane than preprocessed files. We can have the gcc compiler stop at the Compiling stage by giving it a -S argument. This stops gcc at compilation and outputs into the file "hello.s"
Above we take a look at an example of assembly language by peering into our "hello.s" file. As you can see, there are words you recognize and you know what they mean. There are also some weird combinations of letters, numbers, and symbols that look completely out of place among the comprehensible English words. We won't go into what it all means, but think of the assembly level language as a bridge between written English language (in the form of C code) and machine code that the computer understands.
We can even see parts of our function we recognize, like string and main. We've formatted our code into assembly language for a specific reason, the assembler. It's no coincidence that the assembler understands assembly language. As you can see, we're still not down to binary yet. Let's take the next step in our evolution to get there, which takes us to assembling.
Step 3 - Assembling
This is where our assembly code (in the form of hello.s) makes a complete transformation into something that is unreadable to humans (in the form of the object file hello.o). This is called machine code or binary, and it's the only language computers understand. The object file has one important caveat to note, only existing code will transformed into binary. All function calls (in our example we called printf), are still unresolved and will be resolved in the linking process which is the final step of compilation. We can force gcc to stop at the Assembling stage by giving it the argument -c.
As you can already tell, nearly the entirety of our code is no unreadable to us. We do see a few recognizable parts still however. Near the top you can see the "Age is: %d", which is the string we fed to printf. Towards the bottom right, "hello.c", "main", and "printf" appear to us as well. Everything else is, well, incomprehensible. This is by design as we're not running our program, the computer is. The final step of linking will allow the computer to translate the last remaining parts of our code into binary, thus completing the compilation process and producing an executable file. Let's check out linking now.
Step 4 - Linking
We've now arrived at the final step of the compilation process. Linking is the process whereby the linker resolves all function calls to all external libraries. It also links together all object code that might exist in a project with a large scope (think of multiple source code files) with the object code from the external libraries into one executable file called by default "a.out". There obviously is no flag for gcc to stop at the linking stage because it's the final step in the compilation process. However, we can specify the name of the executable file if we find "a.out" doesn't suit our needs. We can specify the name of the executable file by passing gcc the -o argument followed by the name we want for the file.
When we ls to list the contents of the directory, we see that the compilation process has finished steps 1-4 and after specifying with the -o flag that we wanted the name of the executable file to be hello, we see the green "hello" file. The green font is the computer's way of saying "Hey, this is an executable!"
And finally, the moment you've all been waiting for, it's time for our program to serve it's intended purpose. We programmed it to print my age of 28. We've followed it through the entire compilation process step by step. Now that we have a better understanding of what's going on underneath the hood, let's see the result of all the computer's hard work. We can accomplish this by running ./hello. The "./" tells the computer to look in this folder we're currently in for an executable file named "hello". This was the file name we specified to gcc with gcc -o hello.
In Summary
Just to recap, the compilation undertaken by the gcc compiler consists of 4 steps. The first step is Preprocessing. This is where gcc takes your header files and prepends them to your source code. It also strips comments and expands macros into their respective values in the source code. The second step of the process is Compiling. In Compiling, gcc takes the preprocessed file (.i extension) and turns it into assembly language (.s extension). Assembly language is essentially a bridge between human language and machine language. This assembly language code is sent to the third step, which is Assembly. In Assembly, the assembler takes assembly language and translates it into machine code or binary, with the exception of a few parts of our program (mainly function calls). This machine code is finally sent to the linker. The linker bundles together all the object code from the assembler with the object code acquired by resolving dependencies on external libraries. The linker links everything together into one single executable file, which is named by default "a.out". That is, unless we tell gcc otherwise. If you're interested in getting really deep into how gcc works, I suggest reading the gcc man page.
Thanks for taking the time to read this article! I hope you were able to learn a little more about the compilation process and what's going on underneath the hood of your computer. Until next time! Happy Coding!