The Structure of Compiler (Part 1)

In previous post we discussed that for lost of the people compiler is a black box that maps a source program into semantically equivalent target program.

Lets try to open this black box in this post and understand what is the structure of the compiler.

If you are new to compiler and you want to know what compiler is then you can define compiler as a

Program that can read a program into one language and translate it into another language along with reporting errors and warning in the source language that is detected during translation process. errors

Compiler basic working


If we open this box, on the abstract level these box will consist of two part one analysis and other synthesis.

  1. Analysis Part :-

  • This part break your source code into constituent pieces and imposes a grammatical structure on them. The compiler uses these grammatical structure to create an intermediate representation of the source code. Analysis part is one who detect whether our code is syntactically correct or not. These part also collect the data from the source code and and store these data in the data structure known as symbol tables. These symbol table is passed along with the intermediate representation to the synthesis part. Analysis part is also known as front end.

2. Synthesis Part

  • This phase uses the intermediate representation as the input. Synthesis part construct the desired program from the intermediate representation and the information in the symbol table. Synthesis part is known as back end.

Synthesis and analysis phase digram

This above diagram represent the abstract level of the compiler. If we examine the compilation more detail we see that it operates as a sequence of phases, each of which transforms one representation of the source program to another .

In real life scenario for the translation of source code to executable occurs in various phases and it may be possible that several phases are grouped together, and the intermediate representations between the grouped phases need not be constructed explicitly.

The symbol table which stores information about the entire source program is used by all the phases of the compiler.

Some compiler have a machine independent optimization between analysis and synthesis part know as front end and backend respectively.

Backend should be able to produce a better target program is one of the main reason for this optimization on the Intermediate representation.

Before knowing the structure of compiler we should be aware of toolchain and language processing system.

As name suggest toolchain can be define as a set of tool used for software development along with combining with other software to deliver software as a product. This set of tool can be preprocessor, compiler, linker, debugger ,assembler, loader, testing scripts etc.

In compiler design language processing have lots of importance.


The process of converting your source code in an executable in such way that it can be able to run on the desired hardware is known as language processing system. Language processing system is also called as the process of execution of program.        

Language Processing System

  • Language processing system can be called as a cousin of the compiler. Because compiler work is to translate and generate the assembly file required by assembler to give object code. But for running a code on a target machine several other steps are also required,?such as preprocessor, compilers, assemblers, loaders and link editor.
  • Without Language processing system we can imagine a compilers
  • Let understand language processing system. We all have an idea that computer consist of a intellect combination of software and hardware. We right our code or software in high level language but machine understand only binaries 0 or l. We also know that hardware consist of instruction as electronic charges which is equivalent to the binaries in the software field. Writing program in 0 and 1 is a hectic task for the developer. That's the reason developer choose to right in HLL language and with the help of toolchain he get the executable in the form of binaries machine can understand. These code which we get as executable is fed into the series of the devices and operating system component component to get the desired code which can be understand by the machine. This whole process is known as language processing system.

No alt text provided for this image

Lets discuss them in brief:-

  • Preprocessor:- Preprocessor consist of all the header file macros. It take source code as input and generate a .i file which is used by compiler as an input. ?Some of the tasks carried out by the preprocessor are macro substitution, testing for conditional compilation directives, and file inclusion.
  • Compiler:- We already discussed about it above.
  • Assembler:- It take the assembly language generated by the compiler and gives output as a locatable machine code .
  • Linker:- A linker or link editor is a program that takes a collection of objects (created by assemblers and compilers) and combines them into an executable program.
  • Loader:- It loads a linked program in the main memory i.e. (RAM) to execute the executable generated by the linker.

Lets take an example of C code and understand it.

#include<stdio.h>
int main()
{
printf("Hello World\n");
return 0;
}

First this source file will be taken input by the preprocessor. You can see the output generated by preprocess by running below mentioned command

linux>gcc -E -o hello_world.i hello_world.c        

You will see that header file stdio.h is not present instead of it some include directive have came inside the picture.

This file hello_world.i is given to the compiler and compiler generate the assembly code required by the assembler. You can see the output generated by preprocess by running below mentioned command

linux>gcc -S -o hello_world.s hello_world.i        

After running this program you will find that the output generated by the preprocessor became input to the compiler and compiler gives output as a assembly language. You can see assembler output by using below mentioned command.

linux>gcc -c -o hello_world.o hello_world.s        

This hello_world.s is taken input by the assembler and assembler gives output as object code even known as intermediate representation.

linux>gcc  hello_world.o  -o hello_world.elf        

This will link the required linker to the object code and generate hello_world.elf as executable.

linux>./hello_world.elf        

With this command loader get invoked and load the executable and you get the desired result if executable is error free.

The file generated with extension .i, .s , .o is generated by the toolchain temporarily and its get deleted one executable is generation is completed or any error message prompt while generating the executable.

To save this temporary file you can run the following command.

linux>gcc -save-temps hello_world.c        

Here file which will be seen with hello_world.i is generated by preprocessor

File with hello_world.s is generated by compiler

File with hello_world.o is generated by assembler

File with a.out is generated by the linker.


So we can say that whole toolchain is required to do complete translation of the source code to the target language whose heart is a compiler.


We will discuss about structure of compiler in more detail and try to open these black box in upcoming post in this post I am sharing the diagram.

No alt text provided for this image

Thank you for sharing

Shubhangi R.

| Immediate joiner |Senior Analyst/software Engineer at Capgemini

2 年

Nice explanation compiler concept.easy to understanding it. Thank you ??

回复

要查看或添加评论,请登录

abhinav Ashok kumar的更多文章

  • Exploring TVM for Beginners: A Must-Read Guide for Compiler Enthusiasts

    Exploring TVM for Beginners: A Must-Read Guide for Compiler Enthusiasts

    For those diving into machine learning compilers, TVM is a powerful tool that optimizes deep learning models for…

  • Optimizing LLVM Passes: Understanding Pass Execution Time

    Optimizing LLVM Passes: Understanding Pass Execution Time

    Optimizing LLVM passes is crucial for improving performance and efficiency for compiler engineers. A key aspect of this…

  • CPP MCQ Stack

    CPP MCQ Stack

    Welcome to Compiler Sutra — the place to be if you want to improve at C++ and compilers! Link :…

    1 条评论
  • Disabling LLVM Pass

    Disabling LLVM Pass

    ?? Disabling an LLVM Pass for Custom Compiler Modifications ?? LLVM is at the core of many modern compilers, and its…

    1 条评论
  • How LLVM Solve Traditional Compiler Problem m*n

    How LLVM Solve Traditional Compiler Problem m*n

    LLVM (Low-Level Virtual Machine) is a compiler framework that helps compiler developers to transform and build…

  • Pass In LLVM To Count the Number of Instructions in It

    Pass In LLVM To Count the Number of Instructions in It

    You can read the full tutorial here: Read the Full Tutorial This tutorial explores FunctionCount.cpp, a practical…

  • Unlocking C++11 part 2

    Unlocking C++11 part 2

    Hello, Tech Enthusiasts Here is the link for the Unlocking C++11 Part 1 The C++11 standard has transformed how we write…

    1 条评论
  • Unlocking C++11

    Unlocking C++11

    Hello, Tech Enthusiasts! The C++11 standard has transformed how we write C++ code by introducing new features to make…

  • C++11 vs C++14

    C++11 vs C++14

    C++11 and C++14 brought impactful changes and refinements to the language, enhancing developer productivity and…

    1 条评论
  • ???? Mastering Object-Oriented Programming (OOP) in C++: Your Guide to Becoming a Pro ??

    ???? Mastering Object-Oriented Programming (OOP) in C++: Your Guide to Becoming a Pro ??

    Are you ready to take your C++ skills to the next level by mastering Object-Oriented Programming (OOP)? Whether you're…

    1 条评论

社区洞察

其他会员也浏览了