The Structure of Compiler (Part 2)

In previous newsletter i.e. The Structure of Compiler (Part 1) we have discussed in detail about language processing system and how source code get translated to the machine language using toolchain such as gcc, clang etc. In these article we will discuss about the heart of the toolchain which we called as the compiler. Till now we know that the compiler is a tool which do the translation from one language to another. But what is role in toolchain ? How to define compiler in more technical manner.

Compiler can be define as a tool which take input from the preprocessor and give output as a assembly language which is input to assemebler to create an object code.
No alt text provided for this image


Let's now see the diagram of the compiler which is show below

No alt text provided for this image

In the above diagram we can see that compiler first stage is the lexical analyzer which takes input from the pre-processor and gives output as stream of token.

If we discuss about compiler, for design compiler we have to first understand how compiler works and its phases.

Compiler phases are mentioned below:-

  1. Lexical Analyzer
  2. Syntax Analyzer
  3. Semantic Analyzer
  4. Machine Independent Code Optimizer
  5. Code Generation
  6. Machine Dependent Code Optimizer

Let's have a discussion about each phases.

Lexical Analyzer:-

Lets discuss about this phase in little detail. The first phase of compiler is known as lexical analysis or scanning.

In these phase lexical analyzer read the stream of character making up the source program and group the characters into meaningful sequence known as the lexemes. For each lexemes the lexical analyzer produces as output a token .

This token is then passes to the syntax analysis. If we discuss about the token token format is <token-name , attribute-value>

for example if we take source code as

profit = Selling_price - Cost_price

The character in this assignment operation will be grouped into below mentioned lexemes

a. profit is a lexeme that would map to the token <id,1> where id is abstract symbol standing as identifier and 1 points to the symbol table entry of profit.

Lets understand it by taking a real life example:-

Let us assume that you have visited a juice shop where for each juice item in shop their is token id. Whenever you will visit the shop you have to see the juice name which you want to take and you will say to the shopkeeper. shopkeeper will give you token for that particular juice and then you have to give the bill and take the juice from the guy in the shop who is providing the juice with the help of the token id.

Lets understand in more detail these example in the form of lexical analyzer.

Here Juice Name is a lexemes which map with the token<id,1> where id is abstract symbol or the juice_name or identifier and .The image given below explain in much details

No alt text provided for this image

Similarly in above example where profit = Selling_price - Cost_price. Here assignment operator (=) map into the token < = >. Since the token needs no attribute-value we have omitted the second component.

Similarly with Selling_Price and Cost_Price.

No alt text provided for this image

How Lexical Analyzer function

  1. Divide the program into the valid token i.e. tokenization
  2. Remove white space from the tokens
  3. Remove comments
  4. Helps in finding error message by providing row and column number.
  5. Let's taken an example printf("Inside Lexical Analyzer");

printf{1} , ({2} "Inside Lexical Analyzer\n" {3} ){4{ , ;{5}

This are the 5 valid token generation for above example.

Syntax Analysis

  1. The second phase of the compiler is syntax analysis or parsing.
  2. Here the parser uses the first component of the token produced by the lexical analyzer to create a tree-like structure known as known as (AST)Abstract syntax tree that despite the grammar of the stream of token.
  3. We know that tree consist of the parent and the child node. Here in AST each interior node represent the operation and child node represent the arguments of the operation. Syntax tree take input from the lexical analyzer and convert it into the AST.

No alt text provided for this image

The above diagram show the order in which operation in the assignment is to be performed

Profit = Selling_Price - Cost Price

The subsequent phases of the compiler uses the grammatical structure to help analyze the source program and generate the target program. We shall use context-free grammars to specify the grammatical structure of analyzers automatically from certain class of grammars.

Semantic Analysis

No alt text provided for this image

  1. The semantic analyzer uses the syntax tree and the information in the symbol table to check the source program for the semantic consistency with the language definition.
  2. It also gathers the type information and either saved them in the symbol table or inside the syntax tree, for subsequent use during the intermediate code generation.
  3. An important part of the semantic analysis is to do the type checking, where compiler check each operand has a matching operands.

Intermediate Code Generation

  1. In the process of translation of source program into the target code, a compiler uses intermediate representation which may have variety of the forms.
  2. After syntax and semantic analysis of the source code many compiler generate an explicit low level or machine like intermediate representation which we can think of as a program for an abstract machine.
  3. This intermediate representation should have two important properties: It should be easy to produce and it should be easy to translate into machine code.
  4. This intermediate code consist of the an intermediate form known as three address code which consist of a sequence of assembly like instructions with three operands per instruction.

Code Optimization

  1. The machine independent code optimization phase attempts to improve the intermediate code so that better target code will result. Here better means faster but other objectives may be desired such as shorter code, or target code that consumes less power.
  2. A simple intermediate code generation algorithm followed by code optimization is a reasonable way to generate good target code.
  3. Their is a great variation in the amount of code optimization different compilers perform. In those that do the most the so called optimizing compilers a significant amount of time is spent on this phase.

Code Generation

  1. The code generator takes as input an intermediate representation of the source program and mapped into the target language.
  2. If the target language is a machine code then register or memory location are selected for each variable used by the program . Here intermediate representation are translated into the sequence of code that perform the same task.
  3. A crucial aspects of the code generation is the judicious assignment of register to hold values.


Symbol Table Generation

An essential function of a compiler is to record the variable name used in source code and collect information about various attribute of each name used in the source program and collect information about various attribute of each name . These attributes may provide information about the storage allocated for a name its type its scope and in the case of procedure names such things as the number nay type of its arguments.

Symbol table is a data structure containing a record for each variable name with fields for the attribute of the name.


要查看或添加评论,请登录

社区洞察

其他会员也浏览了