C++ compilation steps

Compiling a C++ program involves several steps that translate the code into an executable program that can be run on a computer. Here are the typical steps involved in compiling a C++ program:

Block Diagram:

         ?+-------------------------+
? ? ? ? ? ?|? C++ Source Code (.cpp) |
? ? ? ? ? ?+-------------------------+
? ? ? ? ? ? ? ? ? ? ? ?|
? ? ? ? ? ? ? ? ? ? ? ?V
? ? ? ? ? ?+----------------------+
? ? ? ? ? ?| Preprocessor? ? ? ? ?|  # directives will be replaced 
? ? ? ? ? ?+----------------------+
? ? ? ? ? ? ? ? ? ? ? ?|
? ? ? ? ? ? ? ? ? ? ? ?V
? ? ? ? ? ?+----------------------+
? ? ? ? ? ?| Intermediate Code (.i)|
? ? ? ? ? ?+----------------------+
? ? ? ? ? ? ? ? ? ? ? ?|
? ? ? ? ? ? ? ? ? ? ? ?V
? ? ? ? ? ?+----------------------+
? ? ? ? ? ?| Compiler? ? ? ? ? ? ?|
? ? ? ? ? ?+----------------------+
? ? ? ? ? ? ? ? ? ? ? ?|
? ? ? ? ? ? ? ? ? ? ? ?V
? ? ? ? ? ?+----------------------+
? ? ? ? ? ?| Assembly Code (.s)? ? |
? ? ? ? ? ?+----------------------+
? ? ? ? ? ? ? ? ? ? ? ?|
? ? ? ? ? ? ? ? ? ? ? ?V
? ? ? ? ? ?+----------------------+
? ? ? ? ? ?| Assembler? ? ? ? ? ? |
? ? ? ? ? ?+----------------------+
? ? ? ? ? ? ? ? ? ? ? ?|
? ? ? ? ? ? ? ? ? ? ? ?V
? ? ? ? ? ?+----------------------+
? ? ? ? ? ?| Object Code (.o)? ? ? |
? ? ? ? ? ?+----------------------+
? ? ? ? ? ? ? ? ? ? ? ?|
? ? ? ? ? ? ? ? ? ? ? ?V
? ? ? ? ? ?+----------------------+   
? ? ? ? ? ?| Linker? ? ? ? ? ? ? ?|
? ? ? ? ? ?+----------------------+
? ? ? ? ? ? ? ? ? ? ? ?|
? ? ? ? ? ? ? ? ? ? ? ?V
? ? ? ? ? ?+----------------------+
? ? ? ? ? ?| Executable Program? ?|
? ? ? ? ? ?+----------------------+        

The steps involved in the compilation process are:

  1. Preprocessing: The preprocessor is a program that reads the source code and performs certain operations on it. These operations include file inclusion, macro expansion, and conditional compilation. i.e this step involves processing the source code file and replacing the preprocessor directives with their corresponding values. This step generates a preprocessed source file. The output of the preprocessor is an intermediate file with a ".i" extension.
  2. Compilation: The compiler takes the intermediate file produced by the preprocessor and generates assembly code. Assembly code is a low-level representation of the source code that can be understood by the computer. The output of the compiler is an assembly file with a ".s" extension.
  3. Assembly: The assembler takes the assembly file produced by the compiler and generates object code. Object code is a machine-readable code that can be directly executed by the computer. The output of the assembler is an object file with a ".o" extension.
  4. Linking: The linker takes one or more object files and combines them into an executable program. The linker also resolves external references between the object files and libraries. The output of the linker is an executable program that can be run on the target system.

The compilation process can be performed using various tools and commands depending on the platform and the compiler used. For example, on a Linux system with the GCC compiler, the following commands can be used:

// Preprocessing: generate an intermediate file
gcc -E source.cpp -o source.i


// Compilation: generate an assembly file
gcc -S source.i -o source.s


// Assembly: generate an object file
gcc -c source.s -o source.o


// Linking: generate an executable program
gcc source.o -o program        

This is just a simplified example, and in practice, many other options and flags can be used to customize the compilation process.

In addition to the above steps, there are several other options and flags that can be used with the gcc compiler to control the compilation process, optimize the generated code, enable debugging, etc. Here are some commonly used options:

  • -Wall: This option enables all warning messages.
  • -O2: This option enables level 2 optimization, which performs a moderate amount of optimization without taking too much time.
  • -g: This option includes debugging information in the generated object file, which can be useful for debugging the program later.
  • -std=c++11: This option specifies the C++11 standard for the compiler to use.
  • -Wl,-stack_size sizeOfStack: Increases the stack limit of program.

If you want to increase the stack size limit. In GCC, you can use the -Wl,-stack_size linker option to increase the stack size limit. For example, to set the stack size limit to 1 MB, you can use the following command:

g++ -Wl,-stack_size -Wl,1000000 your_program.cpp -o your_program         

Note that increasing the stack size limit is not always the best solution, as it may consume more memory than necessary and could be a temporary fix for a poorly-designed algorithm.


C++ compilers can be classified based on various factors such as their target platform, optimization level, and architecture. Below are some of the common ways to classify C++ compilers:

  1. Target Platform: C++ compilers can be classified based on the platform they target. For example, there are compilers that target Windows, Linux, macOS, and other operating systems. Examples of such compilers include Microsoft Visual C++, GCC, and Clang.
  2. Architecture: C++ compilers can be classified based on the architecture they support, such as x86, ARM, and PowerPC. For example, there are compilers that target x86 architecture, such as Microsoft Visual C++ and GCC, and there are compilers that target ARM architecture, such as ARM Compiler and Clang.
  3. Optimization Level: C++ compilers can be classified based on the level of optimization they provide. There are compilers that provide different levels of optimization, such as -O1, -O2, and -O3, where each level provides a higher degree of optimization. Examples of such compilers include GCC and Clang.
  4. Language Standards: C++ compilers can be classified based on the C++ language standard they support. For example, there are compilers that support C++98, C++11, C++14, C++17, and C++20. Examples of such compilers include GCC and Clang.
  5. License: C++ compilers can also be classified based on their license, such as open-source or proprietary. Examples of open-source C++ compilers include GCC and Clang, while examples of proprietary C++ compilers include Microsoft Visual C++ and Intel C++ Compiler.

Each of these classifications has its own advantages and disadvantages. For example, compilers that target a specific platform may provide better performance and integration with the platform's APIs, but may not be portable to other platforms. Similarly, compilers that provide higher levels of optimization may produce faster code but may also increase compilation time and use more memory. Choosing the right compiler for a project depends on the specific requirements of the project, such as performance, platform support, and license.


Some of the commonly used compilers:

  1. GCC (GNU Compiler Collection): This is a free and open-source compiler that is widely used for C++ development. It is available for various platforms like Linux, Windows, and macOS. GCC supports various versions of the C++ standard and provides extensive optimization options. It also includes various other tools like GDB (GNU Debugger) for debugging.
  2. Clang: This is another popular open-source compiler that is known for its fast compilation speed and low memory usage. It is available for various platforms like Linux, Windows, and macOS. Clang supports various versions of the C++ standard and provides extensive error messages.
  3. Microsoft Visual C++ Compiler: This is a proprietary compiler provided by Microsoft as part of their Visual Studio IDE. It is available for Windows and supports various versions of the C++ standard. It provides extensive debugging and profiling tools along with advanced optimization options.
  4. Intel C++ Compiler: This is a proprietary compiler provided by Intel for various platforms like Linux and Windows. It is known for its advanced optimization capabilities and supports various versions of the C++ standard. It also provides extensive debugging and profiling tools.
  5. LLVM: This is a collection of modular and reusable compiler and toolchain technologies. It is used by Clang as its backend for code generation. LLVM supports various platforms and provides a flexible framework for building compilers and toolchains.
  6. Digital Mars C++ Compiler: This is a proprietary compiler provided by Digital Mars for various platforms like Windows, macOS, and Linux. It supports various versions of the C++ standard and provides advanced optimization options.

Creating and linking static and shared libraries with C++ code involves several steps. Here is a general overview of the process:

Reference :

More Examples ( tldp.org )

using shared libraries and dlopen + dlsym ( cprogramming.com )

Static libraries, also known as archives, are collections of object files that are linked with a program at compile-time. When a static library is linked with a program, a copy of the library's code is included in the program's executable. This means that the resulting binary is self-contained and does not depend on any external libraries at runtime.

Shared libraries, also known as dynamic libraries or DLLs (Dynamic Link Libraries) in Windows, are loaded into memory at runtime and can be shared between multiple programs. Shared libraries allow for more efficient use of system resources because multiple programs can share a single copy of the library in memory. However, this also means that the library must be present on the system at runtime for the program to run correctly.


Creation of static libraries in C++:

1.Create source files: First, create the source files for the library that you want to build. The source files should contain the function definitions that will be included in the library.

2. Compile the source files: Compile the source files using the C++ compiler to create object files. The object files contain machine code that can be linked together to create the final library.

For example, to compile a single source file named mylib.cpp, you can use the following command:

g++ -c mylib.cpp -o mylib.o         

This will generate an object file named mylib.o.

3. Create the static library: Use the archiving tool to create the library. The archiving tool takes the object files and combines them into a single file that can be linked with your C++ program.

To create a static library named libmylib.a, you can use the following command:

ar rcs libmylib.a mylib.o         

  • This will generate a static library named libmylib.a.

rcs are the options for the ar command.

  • r: replace or add files to the archive
  • c: create the archive if it does not exist
  • s: write an index into the archive, which is used for fast symbol lookup

4. Run ranlib on the library to create an index:

After creating or modifying a static library, we need to index it using the 'ranlib' command. The index is used by the linker to speed up symbol-lookup inside the library during compilation. The command to index the library is:

ranlib libmylib.a          

This will create an index for libmylib.a that will speed up symbol lookup during linking.

This command will generate an index of symbols for the library 'liblibrary.a'. This index is stored within the library file itself and can be used by the linker during the linking process to speed up the symbol-lookup process.

Note: when an archive file's index generation date (stored inside the archive file) is older than the file's last modification date (stored in the file system), a compiler trying to use this library will complain its index is out of date, and abort. There are two ways to overcome the problem:

  1. Use 'ranlib' to re-generate the index.
  2. When copying the archive file to another location, use 'cp -p', instead of only 'cp'. The '-p' flag tells 'cp' to keep all attributes of the file, including its access permissions, owner (if "cp" is invoked by a superuser) and its last modification date. This will cause the compiler to?think the index inside the file is still updated. This method is useful for makefiles that need to copy the library to another directory for some reason.??

5. Compile your main program and link it with the static library:

g++ main.cpp -L/path/to/library -lmylib -o myprogram         

This will compile main.cpp and link it with the static library libmylibrary.a to create an executable named myprogram. The -L flag tells the linker where to find the library, and the -l flag specifies the name of the library to link with.

Shared library creation:

  1. Create source files: First, create the source files for the library that you want to build. The source files should contain the function definitions that will be included in the library.
  2. Create the object files of the source code that you want to include in the shared library. You should use the -fPIC (Position Independent Code) option to compile each source file:

g++ -c -fPIC library.cpp         

We need to compile the code with the '-fPIC' (Position Independent Code) flag to create a shared library. This flag tells the compiler to generate position-independent code, which means that the library can be loaded at any memory address and still function correctly. After creating the shared library, we do not need to run the 'ranlib' command as it is not applicable for shared libraries.

3. Create the shared library using the g++ compiler with the -shared flag, and link the object files:

g++ -shared -o liblibrary.so library.o         

This creates a shared library file named liblibrary.so.

4. Add the path of the shared library to the LD_LIBRARY_PATH environment variable:

export LD_LIBRARY_PATH=/path/to/shared/library:$LD_LIBRARY_PATH         

5. Compile the executable using the -L flag to specify the path to the shared library, and the -l flag to link the library:

g++ -o program program.cpp -L/path/to/shared/library -llibrary         

This creates an executable named program that is linked to the shared library.

6. Run the executable:

./program         

The executable should now be able to access the functions and variables defined in the shared library.

Note1: You may need to use the sudo command to run some of these steps, depending on your system configuration.

Note2: In order to successfully link a shared library at runtime, the dynamic linker needs to have access to the library's symbols. This means that the shared library should also be used during compilation in order to verify and confirm the symbols used in the application code. Thats why in step 5 , we should use the shared lib during compilation.

During compilation, the linker creates a symbol table that contains information about all the symbols used in the application code, including the names and locations of the symbols. When the application is run, the dynamic linker uses this symbol table to locate the symbols that are needed from the shared library and resolve any symbol dependencies.

Without access to the shared library during compilation, the linker would not be able to create an accurate symbol table, which could result in undefined symbol errors at runtime.

How loading of shared lib happens?

When a program that uses a shared library is executed, the dynamic linker is responsible for loading the shared library into memory and resolving the symbols in the library.

Here are the general steps involved in loading a shared library:

  1. The program is executed by the operating system.
  2. The dynamic linker reads the program's executable file and identifies which shared libraries are required by the program.
  3. The dynamic linker searches for the required shared libraries in a set of standard locations (such as /usr/lib), as well as any user-specified locations (specified through the LD_LIBRARY_PATH environment variable).
  4. The dynamic linker loads the shared libraries into memory, resolves the symbols in the libraries, and patches the program's code to point to the memory locations of the symbols.
  5. The program starts executing, and any functions from the shared libraries can be called as needed.

Note that the dynamic linker only loads the necessary parts of the shared libraries into memory, and uses a technique called demand paging to load additional parts of the libraries as needed. This allows programs to use large shared libraries without requiring large amounts of memory to be allocated upfront.

The dynamic loader in Linux is called ld.so or the "run-time linker/loader". It is responsible for loading shared libraries and linking them to the executable during run-time.

What is ldd ?

ldd is a command-line tool used in Linux and Unix-like operating systems to print shared library dependencies of executable files or shared libraries. It stands for "List Dynamic Dependencies". When you run ldd on an executable or shared library file, it will list all the shared libraries that the file depends on.

ldd is not a dynamic loader itself, but rather a utility for analyzing dependencies of shared libraries and executables. The dynamic loader in Linux is actually part of the operating system's kernel, and is responsible for loading shared libraries into memory and resolving their symbols at runtime.


How shared libraries can be linked at runtime using the dynamic linker ?

Shared libraries can be linked at runtime using the dynamic linker by using the dlopen(), dlsym(), and dlclose() functions. These functions are part of the POSIX standard and are available on most Unix-like systems.

Here is a brief overview of the steps involved in linking a shared library at runtime using the dynamic linker: I will try to cover the below topic in a separate article as it needs to be demonstrated with detailed examples.


  1. Create the shared lib as mentioned in the above
  2. <dlfcn.h> Need dlfcn.h for the routines to dynamically load libraries in the main program where dynamic lib to be included
  3. Call dlopen() to load the shared library into memory. This function takes a filename or path to the shared library as an argument, and returns a handle to the loaded library. (void *module; ?module = dlopen("mapping.so", RTLD_LAZY);)
  4. Call dlsym() to retrieve a function or symbol from the loaded library. This function takes the handle returned by dlopen() and a symbol name as arguments, and returns a pointer to the corresponding function or symbol.(demo_function = dlsym(module, "my_cnt");) here demo_function is function pointer and "my_cnt" is actual name of function in the lib which will be called.
  5. Call the function or use the symbol pointer returned by dlsym() as needed in your program.(?call the function in the DL library => (*demo_function)();)
  6. Call dlclose() to unload the shared library from memory when it is no longer needed. This function takes the handle returned by dlopen() as an argument.

It's important to note that using the dynamic linker to link shared libraries at runtime can introduce some complexity and potential issues, such as versioning and symbol conflicts. Therefore, it is generally recommended to use static linking or the system linker (e.g. ld) for linking shared libraries whenever possible. However, runtime linking can be useful in certain cases, such as when loading plugins or when the library dependencies cannot be resolved at compile time.

What is A dynamic linker (or dynamic loader) ?

Dynamic linker (or dynamic loader) is a component of an operating system that loads shared libraries into a running program at runtime, instead of at compile time. It allows the program to use shared libraries that were not present or available at compile time, enabling dynamic loading of modules or plugins.

The dynamic linker locates the necessary shared libraries, loads them into memory, resolves any symbols they provide, and performs any necessary relocations. This allows the program to use functions and resources from the shared libraries as if they were part of the program itself.

The dynamic linker is typically invoked automatically by the operating system when a program is started. It is responsible for loading any required shared libraries and resolving their symbols before transferring control to the program's entry point.

Weak linking, OR weak referencing?

Weak linking, also known as weak referencing, is a method of dynamic linking in which a symbol reference is not required to be resolved at link-time. Instead, the linker is allowed to leave the symbol reference unresolved and defer the resolution to run-time.

This is typically used when a library has optional functionality that may or may not be present at run-time. By using weak linking, the library can gracefully handle the situation where the optional functionality is not available, rather than crashing or throwing an error.

In C/C++, weak symbols are defined with the "weak" attribute, which indicates to the linker that the symbol reference is weak and can be left unresolved. On the other hand, strong symbols are defined without the "weak" attribute and must be resolved at link-time.

In the context of shared libraries, weak linking can be used to provide optional features, default values, or fallbacks for missing symbols. It is often used in combination with dynamic loading and function pointers to provide a flexible and extensible library interface.

One example of weak linking in C/C++ is the use of the "dlopen" and "dlsym" functions to dynamically load and resolve symbols at run-time. The "dlsym" function can be used to resolve a symbol reference that may or may not be present in the loaded library, by using a weak symbol reference as a fallback.

Example scenerios:

Suppose we have two shared libraries: "libfoo.so" and "libbar.so". "libfoo.so" depends on symbols defined in "libbar.so". We can link "libfoo.so" with "libbar.so" in two ways:

  1. Strong linking: If we link "libfoo.so" with "libbar.so" using strong linking, then all symbols defined in "libbar.so" that are required by "libfoo.so" will be resolved at the time of linking. This means that if any symbol is missing or unresolved at the time of linking, the linker will produce an error.
  2. Weak linking: If we link "libfoo.so" with "libbar.so" using weak linking, then symbols that are not found in "libbar.so" will be treated as weak symbols. This means that if any symbol is missing or unresolved at the time of linking, the linker will not produce an error. Instead, it will create a reference to the symbol as a weak symbol.


In C++, a symbol can be marked as weak using the __attribute__((weak)) attribute. For example:

// Declaration of weak symbol
__attribute__((weak)) int my_weak_symbol;


int main() {
? ?// Checking if the weak symbol is defined or not
? ?if (&my_weak_symbol == nullptr) {
? ? ? // Symbol is not defined
? ?} else {
? ? ? // Symbol is defined
? ?}
? ?return 0;
}        

In the above example, my_weak_symbol is declared as a weak symbol using the __attribute__((weak)) attribute. It is then checked in main() whether the symbol is defined or not.

Note that the behavior of weak symbols may differ between different operating systems and linkers. Therefore, it is recommended to consult the documentation of your particular system and linker to understand their behavior.


Thanks for reading till end , please comment if any !.

Davood Va-ez

Embedded Systems Engineer / Programmer

6 个月

great . easy to understand !

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了