The Structure of Compiler (Part 2)
abhinav tiwari
Curating Insights & Innovating in GPU Compiler | Performance Analyst at Qualcomm | LLVM Contributor | Maintain News Letter | AI/ML in Compiler
In previous newsletter i.e. The Structure of Compiler (Part 1) we have discussed in detail about language processing system and how source code get translated to the machine language using toolchain such as gcc, clang etc. In these article we will discuss about the heart of the toolchain which we called as the compiler. Till now we know that the compiler is a tool which do the translation from one language to another. But what is role in toolchain ? How to define compiler in more technical manner.
Compiler can be define as a tool which take input from the preprocessor and give output as a assembly language which is input to assemebler to create an object code.
Let's now see the diagram of the compiler which is show below
In the above diagram we can see that compiler first stage is the lexical analyzer which takes input from the pre-processor and gives output as stream of token.
If we discuss about compiler, for design compiler we have to first understand how compiler works and its phases.
Compiler phases are mentioned below:-
Let's have a discussion about each phases.
Lexical Analyzer:-
Lets discuss about this phase in little detail. The first phase of compiler is known as lexical analysis or scanning.
In these phase lexical analyzer read the stream of character making up the source program and group the characters into meaningful sequence known as the lexemes. For each lexemes the lexical analyzer produces as output a token .
This token is then passes to the syntax analysis. If we discuss about the token token format is <token-name , attribute-value>
for example if we take source code as
profit = Selling_price - Cost_price
The character in this assignment operation will be grouped into below mentioned lexemes
a. profit is a lexeme that would map to the token <id,1> where id is abstract symbol standing as identifier and 1 points to the symbol table entry of profit.
Lets understand it by taking a real life example:-
Let us assume that you have visited a juice shop where for each juice item in shop their is token id. Whenever you will visit the shop you have to see the juice name which you want to take and you will say to the shopkeeper. shopkeeper will give you token for that particular juice and then you have to give the bill and take the juice from the guy in the shop who is providing the juice with the help of the token id.
Lets understand in more detail these example in the form of lexical analyzer.
Here Juice Name is a lexemes which map with the token<id,1> where id is abstract symbol or the juice_name or identifier and .The image given below explain in much details
Similarly in above example where profit = Selling_price - Cost_price. Here assignment operator (=) map into the token < = >. Since the token needs no attribute-value we have omitted the second component.
Similarly with Selling_Price and Cost_Price.
领英推荐
How Lexical Analyzer function
printf{1} , ({2} "Inside Lexical Analyzer\n" {3} ){4{ , ;{5}
This are the 5 valid token generation for above example.
Syntax Analysis
The above diagram show the order in which operation in the assignment is to be performed
Profit = Selling_Price - Cost Price
The subsequent phases of the compiler uses the grammatical structure to help analyze the source program and generate the target program. We shall use context-free grammars to specify the grammatical structure of analyzers automatically from certain class of grammars.
Semantic Analysis
Intermediate Code Generation
Code Optimization
Code Generation
Symbol Table Generation
An essential function of a compiler is to record the variable name used in source code and collect information about various attribute of each name used in the source program and collect information about various attribute of each name . These attributes may provide information about the storage allocated for a name its type its scope and in the case of procedure names such things as the number nay type of its arguments.
Symbol table is a data structure containing a record for each variable name with fields for the attribute of the name.