The Structure of Compiler (Part 1)
abhinav Ashok kumar
Curating Insights & Innovating in GPU Compiler | Performance Analyst at Qualcomm | LLVM Contributor | Maintain News Letter | AI/ML in Compiler
In previous post we discussed that for lost of the people compiler is a black box that maps a source program into semantically equivalent target program.
Lets try to open this black box in this post and understand what is the structure of the compiler.
If you are new to compiler and you want to know what compiler is then you can define compiler as a
Program that can read a program into one language and translate it into another language along with reporting errors and warning in the source language that is detected during translation process. errors
If we open this box, on the abstract level these box will consist of two part one analysis and other synthesis.
2. Synthesis Part
This above diagram represent the abstract level of the compiler. If we examine the compilation more detail we see that it operates as a sequence of phases, each of which transforms one representation of the source program to another .
In real life scenario for the translation of source code to executable occurs in various phases and it may be possible that several phases are grouped together, and the intermediate representations between the grouped phases need not be constructed explicitly.
The symbol table which stores information about the entire source program is used by all the phases of the compiler.
Some compiler have a machine independent optimization between analysis and synthesis part know as front end and backend respectively.
Backend should be able to produce a better target program is one of the main reason for this optimization on the Intermediate representation.
Before knowing the structure of compiler we should be aware of toolchain and language processing system.
As name suggest toolchain can be define as a set of tool used for software development along with combining with other software to deliver software as a product. This set of tool can be preprocessor, compiler, linker, debugger ,assembler, loader, testing scripts etc.
In compiler design language processing have lots of importance.
The process of converting your source code in an executable in such way that it can be able to run on the desired hardware is known as language processing system. Language processing system is also called as the process of execution of program.
Language Processing System
Lets discuss them in brief:-
Lets take an example of C code and understand it.
领英推荐
#include<stdio.h>
int main()
{
printf("Hello World\n");
return 0;
}
First this source file will be taken input by the preprocessor. You can see the output generated by preprocess by running below mentioned command
linux>gcc -E -o hello_world.i hello_world.c
You will see that header file stdio.h is not present instead of it some include directive have came inside the picture.
This file hello_world.i is given to the compiler and compiler generate the assembly code required by the assembler. You can see the output generated by preprocess by running below mentioned command
linux>gcc -S -o hello_world.s hello_world.i
After running this program you will find that the output generated by the preprocessor became input to the compiler and compiler gives output as a assembly language. You can see assembler output by using below mentioned command.
linux>gcc -c -o hello_world.o hello_world.s
This hello_world.s is taken input by the assembler and assembler gives output as object code even known as intermediate representation.
linux>gcc hello_world.o -o hello_world.elf
This will link the required linker to the object code and generate hello_world.elf as executable.
linux>./hello_world.elf
With this command loader get invoked and load the executable and you get the desired result if executable is error free.
The file generated with extension .i, .s , .o is generated by the toolchain temporarily and its get deleted one executable is generation is completed or any error message prompt while generating the executable.
To save this temporary file you can run the following command.
linux>gcc -save-temps hello_world.c
Here file which will be seen with hello_world.i is generated by preprocessor
File with hello_world.s is generated by compiler
File with hello_world.o is generated by assembler
File with a.out is generated by the linker.
So we can say that whole toolchain is required to do complete translation of the source code to the target language whose heart is a compiler.
We will discuss about structure of compiler in more detail and try to open these black box in upcoming post in this post I am sharing the diagram.
Thank you for sharing
| Immediate joiner |Senior Analyst/software Engineer at Capgemini
2 年Nice explanation compiler concept.easy to understanding it. Thank you ??