Case Study What is Compiler? Explain different phases of Compiler in details.
Compiler: Where it all started …
Software for early computers was primarily written in assembly language for many years. Higher level programming languages were not invented until the benefits of being able to reuse software on different kinds of CPUs started to become significantly greater than the cost of writing a compiler. The very limited memory capacity of early computers also created many technical problems when implementing a compiler.
Towards the end of the 1950s, machine-independent programming languages were first proposed. Subsequently, several experimental compilers were developed. The first compiler was written by Grace Hopper, in 1952, for the A-0 programming language. The FORTRAN team led by John Backus at IBM is generally credited as having introduced the first complete compiler in 1957. COBOL was an early language to be compiled on multiple architectures, in 1960. In many application domains the idea of using a higher-level language quickly caught on. Because of the expanding functionality supported by newer programming languages and the increasing complexity of computer architectures, compilers have become more and more complex. Early compilers were written in assembly language.
Compiler: The Breakthrough (Self-hosting compiler) …
The first self-hosting compiler capable of compiling its own source code in a high-level language was created for Lisp by Tim Hart and Mike Levin at MIT in 1962. Since the 1970s it has become common practice to implement a compiler in the language it compiles, although both Pascal and C have been popular choices for implementation language. Building a self-hosting compiler is a bootstrapping problem the first such compiler for a language must be compiled either by hand or by a compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter.
Compiler: Analysis of Compiler Phases …
There are basically have two phases of compilers, namely Analysis phase and Synthesis phase. Analysis phase creates an intermediate representation from the given source code. Synthesis phase creates an equivalent target program from the intermediate representation.
Symbol Table: It is a data structure being used and maintained by the compiler, consists all the identifier’s name along with their types. It helps the compiler to function smoothly by finding the identifiers quickly.
The analysis of a source program is divided into mainly three phases. They are:
- Linear Analysis
- This involves a scanning phase where the stream of characters is read from left to right. It is then then grouped into various tokens having a collective meaning.
- Hierarchical Analysis-
- In this analysis phase, based on a collective meaning, the tokens are categorized hierarchically into nested groups.
- Semantic Analysis-
- This phase is used to check whether the components of the source program are meaningful or not.
The compiler has two modules namely front end and back end. Front-end constitutes of the Lexical analyzer, semantic analyzer, syntax analyzer and intermediate code generator. And the rest are assembled to form the back end.
- Lexical Analyzer –
- It is also called scanner. It takes the output of preprocessor (which performs file inclusion and macro expansion) as the input which is in pure high-level language. It reads the characters from source program and groups them into lexemes (sequence of characters that “go together”). Each lexeme corresponds to a token. Tokens are defined by regular expressions which are understood by the lexical analyzer. It also removes lexical errors (for e.g., erroneous characters), comments and white space.
- Syntax Analyzer – It is sometimes called as parser. It constructs the parse tree. It takes all the tokens one by one and uses Context Free Grammar to construct the parse tree.
The rules of programming can be entirely represented in some few productions. Using these productions, we can represent what the program actually is. The input has to be checked whether it is in the desired format or not.
The parse tree is also called the derivation tree. Parse trees are generally constructed to check for ambiguity in the given grammar. There are certain rules associated with the derivation tree.
- Any identifier is an expression
- Any number can be called an expression
- Performing any operations in the given expression will always result in an expression. For example, the sum of two expressions is also an expression.
- The parse tree can be compressed to form a syntax tree
- Syntax error can be detected at this level if the input is not in accordance with the grammar.
- Semantic Analyzer – It verifies the parse tree, whether it’s meaningful or not. It furthermore produces a verified parse tree. It also does type checking, Label checking and Flow control checking.
- Intermediate Code Generator – It generates intermediate code, that is a form which can be readily executed by machine We have many popular intermediate codes. Example – Three address code etc. Intermediate code is converted to machine language using the last two phases which are platform dependent.
Till intermediate code, it is same for every compiler out there, but after that, it depends on the platform. To build a new compiler we don’t need to build it from scratch. We can take the intermediate code from the already existing compiler and build the last two parts.
- Code Optimizer – It transforms the code so that it consumes fewer resources and produces more speed. The meaning of the code being transformed is not altered. Optimization can be categorized into two types: machine dependent and machine independent.
- Target Code Generator – The main purpose of Target Code generator is to write a code that the machine can understand and also register allocation, instruction selection etc. The output is dependent on the type of assembler. This is the final stage of compilation. The optimized code is converted into relocatable machine code which then forms the input to the linker and loader.
All these six phases are associated with the symbol table manager and error handler as shown in the above block diagram.
Compiler: Costs and Benefits …
Advantage:
- Self-Contained and Efficient
One major advantage of programs that are compiled is that they are self-contained units that are ready to be executed. Because they are already compiled into machine language binaries, there is no second application or package that the user has to keep up-to-date.
- Hardware Optimization
While being locked into a specific hardware package has its downsides, compiling a program can also increase its performance. Users can send specific options to compilers regarding the details of the hardware the program will be running on. This allows the compiler to create machine language code that makes the most efficient use of the specified hardware, as opposed to more generic code. This also allows advanced users to optimize a program's performance on their computers.
Disadvantage:
- Hardware Specific
Because a compiler translates source code into a specific machine language, programs have to be specifically compiled for OS X, Windows or Linux, as well as specifically for 32-bit or 64-bit architectures. For a programmer or software company trying to get a product out to the widest possible audience, this means maintaining multiple versions of the source code for the same application. This results in more time spent on source code maintenance and extra trouble when updates are released.
- Compile Times
One of the drawbacks of having a compiler is that it must actually compile source code. While the small programs that many novice's programmers code take trivial amounts of time to compile, larger application suites can take significant amounts of time to compile. When programmers have nothing to do but wait for the compiler to finish, this time can add up—especially during the development stage, when the code has to be compiled in order to test functionality and troubleshoot glitches.
Reference:
- Computerphile: https://youtu.be/lJf2i87jgFA
- The Cherno: https://youtu.be/3tIqpEmWMLI
- https://www.tutorialspoint.com/compiler_design/compiler_design_phases_of_compiler.html
- Gate Smashers:https://youtu.be/PT9iWM80PDU
THANK YOU
-Onasvee Banarse