How Programs Are Built in the C Programming Language

How Programs Are Built in the C Programming Language

On Linux and other similar environments, we generally use tons of commands, utilities, and tools on a daily basis. However, we often overlook the process of creating these applications or programs. In this article, I will give a walk-through of the build process for Linux and other similar systems. My article will focus on the compilation system in the C language.

Programs Are Translated by Other Programs into Different Forms

GNU C Compiler (GCC) is one of the most commonly used compilers in Linux-based systems. In Ubuntu, it can be found in the build-essential package. GCC reads the source file 'hello.c' and translates it into an executable object file named hello. An object file is one that contains object code. Object code contains a sequence of statements or instructions in a machine language (ML) or an intermediate language (IL), such as Register Transfer Language (RTL), Java Byte Code, Microsoft Intermediate Language (MSIL), etc.

The translation process

The diagram below illustrates the four phases in which a C program undergoes translation. The process comprises four phases (i) preprocessing, (ii) compilation, (iii) assembling, and (iv) linking, known collectively as the compilation process.



Compilation process of a C program.


1. Preprocessing Phase

The preprocessor (cpp) modifies the original C program according to directives that begin with the # sign. The source code passes through this first phase. The preprocessing phase performs the following tasks.

  • Comments are removed.
  • Expansion of Macros
  • Expansion of the included files.

A preprocessor directive is basically instructions to the compiler to perform certain tasks before starting the actual compilation.

?On Linux / UNIX based systems

?$ gcc –E hello.c –o hello.i        

?On Windows based systems

?> gcc.exe –E hello.c –o hello.i        

??

2. Compilation Phase

The compiler (cc1) translates the text file 'hello.i' into the text file 'hello.s'. hello.s contains an assembly-language program now. Each statement in an assembly-language program exactly describes one low-level machine-language instruction in a standard text form. Assembly language is useful because it provides a common output language for different compilers for different high-level languages.

?On Linux / UNIX based systems

?$ gcc –S hello.i –o hello.s        

?On Windows based systems

?> gcc.exe –S hello.i –o hello.s        

?

3. Assembly Phase

Next, the assembler (as) translates hello.s into machine language instructions, packages them in a form known as a relocatable object program, and stores the result in the object file hello.o.

The 'hello.o' file is a binary file whose bytes encode machine language instructions rather than characters. If we were to view hello.o with a text editor, it would appear to be gibberish.

?On Linux / UNIX based systems

?$ gcc –c hello.s –o hello.o        

?On Windows based systems

?> gcc.exe –c hello.i –o hello.s        

??

4. Linking Phase

Notice that our hello program calls the printf function, which is part of the standard C library provided by every C compiler. The printf function resides in a separate precompiled object file called printf.o (generally libc library), which must be somehow merged or linked with our hello.o program. The linker (ld) handles this merging or linking process.

The result is the hello file, which is an executable object file (or simply executable) that is ready to be loaded into memory and executed by the system.

?On Linux / UNIX based systems

?$ gcc hello.o –o hello        

?On Windows based systems

?> gcc.exe hello.o –o hello        

?



?

?

Rizwan Ahmed Khan, Ph.D.

AI for Humanity | Invited Professor @ Université Jean Monnet | Erasmus Mundus Scholar | Award-Winning Educator | AI Researcher (Responsible AI, Machine Learning and Computer Vision, Medical Image Analysis)

2 个月
回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了