登录查看更多内容

"Hello World": From the Source Code in an IDE all the way to Machine Code

Kenneth Fukizi

Software engineer, author and tech speaker

发布日期: 2019年3月24日

The seemingly accepted norm for introducing someone to a programming language is to do a 'hello world' in its most rudimentary form, but in this article we will go a bit further... in fact much further to dissect what really happens under the hood for a simple C# hello world program, which I hope will be helpful to give a better context even to some experienced developers.

First things being first, we create a basic .Net Core Console App project and/or solution from the provided Visual Studio template, which I have just conveniently named as ConsoleApp.

The figure above shows a basic C# program that writes "Hello World AfrikanCoder" to the windows console when run...The line numbers are there for reference purposes only.

The only line of code I have added that was not part of the default template is line 10 - Console.Readline() just to make sure that if I run the program, it will persistently display the "Hello World AfrikanCoder!" text and keep the console open until we hit the new line carriage return as in the enter button.

The Basics (...experienced developers can skip this part)

In line 1, we make use of the Microsoft's predefined using directive that allows us to use the types in the System namespace so that we do not have to qualify the use of the type. In this case if we did not import System for use in our ConsoleApp namespace with the using directive, we would have had to call the Console.WriteLine method in line 9 with its full name as System.Console.Writeline.

We would also have to call the Console.ReadLine method as System.Console.Readline, and already we can see the repetition that may be trivial and can be ignored in the context of this small program, but it quickly gets out of hand for bigger programs that need to use lots of functionality that ships with the .Net Framework to help us with common tasks.

An example of what other types come with the System namespace can be seen in one of the pictures below in this same article.

Namespaces

By default our visual studio template defined a namespace with a name exactly the same as what we gave our project (namespace ConsoleApp). This is nothing but a way to help us neatly organize types(classes, interfaces, etc.). This helps us define members of our types without fearing for any conflicts with other members in different namespaces, as they can be called with qualified full names should there be need for it.

It’s normal, that in large programs there will be many classes. Having them in flat hierarchy would not be convenient. You can imagine for example if in your computer you were only allowed to save items in one folder called documents, without being able to create sub-folders... this is why C# has a concept of namespaces that actually allow us to organize classes or other types we create in a hierarchical way.

Namespaces, just like classes can be nested in C#, and that just means you can have one inside another.

Classes

We then see in line 5 a definition of a class called Program whose scope is determined by the braces as shown between line 6 and 10. Anything that is within the braces is part of the Program class and is usually called a member of the class.

We will not be discussing access modifiers and what consists of a class and anything else in this article as the aim of the article is just to give us a high-level overview of a C# program.

Plunging a Bit deeper

It is worth mentioning that the System.Console class has other methods outside of the WriteLine() method in our demonstration which include Beep(), Clear(), MoveBufferArea(), OpenStandardError(), OpenStandardInput(), OpenStandardInput(), Read(),ReadKey(), ResetColor(), SetBufferSize(),SetCursorPosition(), SetError(), SetIn(), SetOut(), SetWindowPosition(), SetWindowSize(), Write() methods.

Some of the methods above may have different overloads, just like the WriteLine() method as shown below in the System.Console class

When we call the WriteLine() method, a C# program expects us either to call it without any input parameters or it expects us to pass in different kinds of input types like bool, char[] , etc as prescribed and shown above, but in our example we have passed in a string literal "Hello World AfrikanCoder!" and automatically the compiler will intelligently guess the exact method we are trying to use and will pick the following flavor of a method, just because our parameter is a string type, not boolean, or integer, etc.

The System.Console reference assembly may be found in the following folder path, along with other reference assemblies which are used to build programs, in other words this is the assembly that is passed to the compiler as a reference.

 C:\Users\[user]\.nuget\packages\microsoft.netcore.app\...
...\2.1.0\ref\netcoreapp2.1\System.Console.dll

Please note that the path maybe different with different versions of the .Net Framework.

C# assembly files are stored with a .dll file extension, and are normally referred to as DLL files. DLL(dynamic link library) files are dynamically linked with the program that uses them during program execution rather than being compiled into the main program with an obvious advantage of saving random access memory , (commonly just referred to as memory), because the files don't get loaded into memory together with the main program. A dll file is loaded and run only when it is needed.

For every reference assembly there is an associated xml file that contains its metadata.

If we look at the contents of System.Console.xml...it contains different members but the one of our particular concern in this instance is the WriteLine() method with the string overload(because we are passing in a literal string "Hello World AfrikanCoder"), and is represented as follows in the xml file:

Compiling the ConsoleApp

We can build our console application by changing directory (cd) into the directory that is holding our source files, and then use csc.exe , which in turn depends on the path where this executable is located, but to make matters simpler we will build our applications using the visual studio IDE by using the build tab which has several options for you on a click of a button.

When we cd into the project folder, we find the following files and folders, and at this point we are more concerned with the bin and obj folders:

We will note that the bin folder holds the results of the build and that means the binaries to distribute or archive, the xml documentation file for older versions of .Net Framework or json for the latest .Net Core, other necessary dependencies and debug symbols. The bin folder holds binary files, which are the actual executable code for our application or library.

The obj folder in comparison contains temporary files created during compilation. They are preserved like that for an incremental build, the compiler can skip individual source files if they haven't changed and use the temporary files instead,and obviously that is faster.

The obj folder holds object, or intermediate files, in other words compiled binary files that haven't been linked yet. They're essentially fragments that will be combined to produce the final executable.

The bin and obj folders are further subdivided into Debug and Release folders, which correspond to the project's build configurations. The executable files and supporting files placed into the appropriate folder, depending on which type of build you perform (debug or release).

The release folder is comparatively lightweight and more optimized and it naturally makes sense to carry this across for any production environment use.

The figure below shows a summary of the compilation process:

Why Source Code to Intermediate Language instead of Native/Machine Code

Machine code is dependent on the specification of computers that includes the Operating System, processor, memory, and even the computer itself.

The specs will almost always obviously be different from the developer's computer to the user's computer, and in this scenario if we want our C# code to be converted into machine code directly then we have to compile our application on the user's computer as well.

If we have users on Windows and Linux then we have to compile our code on both platforms independently and this is not quite convenient.

So when we compile code on our computer as a developer, the C# compiler converts our code into CIL code so that the CLR (common language runtime) will then take care of converting compiled CIL(common intermediate language) code into machine code with the help of JIT compiler in all types of Systems having different configurations saving us from the need to compile over and over.

Plunging Even More Deeper

For us to understand and appreciate the processes involved in compiling, linking, loading and running C# programs , let us use an Intermediate Language Dis-assembler tool ILDASM, which we can use by typing ildasm.exe in our Visual Studio Developer Command Prompt and then open our project as follows:

Assembly Language

Before we look at what happens under the hood, let's have a brief introduction to an assembly language.

An assembly language is the language of a central processing unit CPU, but where the numbers associated with the machine code are replaced by easy-to-remember mnemonics.

Instead of programming using pure hexadecimal that looks like a3 0c 10 00 06, thank heavens programmers can use something easier to remember and read, such as ADD ESP, 4, which adds 4 to ESP.

The human readable version is read by a program called an assembler, and then it is translated into machine code by a process called assembling which is analogous to compiling in high-level languages as we have seen above.

A modern assembly language is the result of both the physical CPU and the assembler. Modern assembly languages also have high-level features such as macros and user-defined data types but they are not the focus of this article, so let's go and sneak into what is the Intermediate Language that our ConsoleApp produced.

If we sneak-peek into our manifest, it will look like the following figure:

The most important thing to note is that the .assembly declaration is used to declare the name of the assembly for our programs, rather obviously so...

We will skip the .class private auto, at this point as it is not relevant for our ConsoleApp that is quite rudimentary and does not have any auto properties implemented.

And then the next important aspect is the constructor that looks like the following figure when dis-assembled:

The constructor introduces a few attributes that are similar to what we will see in the main method as well including hidebysig attribute which means that the member in the base class with same name and signature is hidden from a derived class, and this is vital to avoid conflicts.

The .ctor directive represents an instance level constructor.

In both the constructor and Main method, we see an evaluation stack , defined further down, that starts with keys that come in the form of IL_0000: followed by a base or objectmodel instruction e.g. pop , and optionally a parameter acted upon.

You may notice that the keys on the evaluation stack are not necessarily unique to the associated instruction, at least if you compare the constructor and main method we have IL_0001 that refers to call and ldstr respectively. This is so because these keys are just representations for us as programmers to be able to read much more easily. Their underlying values are represented in CIL(Common Intermediate Language) Opcodes in base 16 hexadecimal numbering system.

The base 16, hexadecimal often abbreviated to hex numbering system is regularly used in computer coding for conveniently representing a byte of data.

Common Intermediate Language (CIL) Opcodes

CIL opcodes are one or more bytes long, (a byte being 8 bits long) and they can be followed by zero or more operand bytes. All opcodes whose first byte lies in the ranges 0x00 through 0xEF, or 0xFC through 0xFF are reserved for standardization and this is the lot that we may have interest in. There's another range that are available for experimental purposes should you want to play around but the audience for this article is not x64 CPU programmers (i actually wish I was one, though...lol).

In the comment section below, you will find a link to the ecma-335 standard that gives you more OPCODE descriptions, but the table below suffices to explain the instructions we see in the CIL produced by our ConsoleApp program.

The 0x prefix in the Opcodes is just there to indicate that a number is in hexadecimal rather than in some other base. The C# programming language uses it to inform the compiler. If it was binary we would for example have a 0b prefix as in 0b1000...

The ldarg opcode is used to load the argument passed in method to the stack. ldarg.0 in the case of the above constructor IL means load the first argument on the stack.

With the few introductions above, we are at least a bit more able to read the CIL for our main method below, in which it's respective source code essentially just had 2 lines of code in Console.WriteLine("Hello World AfrikanCoder") and Console.ReadLine().

An Evaluation Stack is used to hold a local variable or a method argument before they are evaluated. Before the start of every method the evaluation stack is empty and during the execution of the method, the CIL instructions adds/removes the items from the evaluation stack , the end result of which is an empty evaluation stack at the end of that method execution.

The nop (no operation) opcode instruction is simply a debug build artifact and is used to allow to put breakpoint on the curly braces.

Next, the ldstr opcode instruction loads our string "Hello World AfrikanCoder" onto the evaluation stack, and then the call opcode instruction is used to call the System.Console.WriteLine method with the string already loaded passed in as an argument.

We see both the nop and call opcode instructutions again, this time around for the Console.ReadLine method, followed by emptying the stack with pop and returning from the method with ret opcode instructions respectively.

How Then do the IL Statements get to CPU instructions?

At this point, the CPU ,or we may as well call it a computer, won't be able to still understand the opcodes as they are in hexadecimal (in fact it doesn't understand anything at all).

ILASM.exe then takes this CIL code and translates it into native x86 machine code in binary (1s and 0s or high and low electric voltage) which is then executed by the CPU.

The representation is still Hexadecimal in assembly language for the sake of a human programmer like you and I because it still is much easier for a human being to distinguish 0xAB from 0xAC than it is to distinguish 10101011 from 10101100 especially when the screen is full of numbers. It is also easier to convert a number from hex to binary or binary to hex in one’s head than from decimal to binary and back.

Hexadecimal code is essentially just a convenient way for a human programmer to read and write binary code.

The CPU works using binary. Electronically this is done with electronic switches that are either on or off. This is represented on paper by zeros and ones. A single bit or binary digit requires one wire or switch within the CPU. Usually data is handled in bytes or multiples of bytes. A Byte is a group of eight bits. A byte looks like this

01001011

This is inconvenient to read, say and write down so programmers use hexadecimal to represent bytes. Converting between binary and hexadecimal is not difficult. First we split the byte into two half a byte sections as follows

0100 1011

Then we use the following table:

As we have demonstrated above, it is easy for us to convert between hexadecimal and binary, and we should expect the assembler to find it much easier than us to to the conversions as well, which it does on our behalf from the Common Intermediate Language to binary that then gets executed by the CPU

Summary

We have just seen the whole process of typing a human readable source code in Visual Studio IDE, and gone through all the process that goes on in translating the code we wrote into something executable by a computer, albeit on a very high level view.

This is quite important for a developer to understand even though most are not expected to work with deeper levels of abstraction, but having a mental map surely helps put things in perspective and make one a better informed programmer to help in making decisions on the higher abstraction levels that we normally work with.

要查看或添加评论，请登录

Kenneth Fukizi的更多文章

AFRICA REPRESENTING ON GITHUB!

2023年6月10日

AFRICA REPRESENTING ON GITHUB!

I spent some serious time on Github to find who is representing Africa and is visible enough, with what technologies…

1 条评论
SILICON VALLEY AND AFRICAN-LED BUSINESSES

2023年6月10日

SILICON VALLEY AND AFRICAN-LED BUSINESSES

There are a few Africans who dared to operate at a global level, through Silicon Valley. Who are they, what are they…

1 条评论
Design patterns and principles - A high-level view

2022年9月18日

Design patterns and principles - A high-level view

It's not a rule of thumb that you have to use design patterns at all costs. They are just there to guide you in solving…
Architectural patterns - A high-level view

2022年9月17日

Architectural patterns - A high-level view

In an earlier article, I gave an example that a poorly architected bridge may not stand some unusual floods if…

1 条评论
Skills needed to write code - First things first

2022年9月16日

Skills needed to write code - First things first

It will be worthwhile to note that Coding (or programming) is ONLY just one of the many aspects of software…
Before even writing code, what do you need to know?

2022年9月15日

Before even writing code, what do you need to know?

You will be ecstatic to know that writing code is actually the easiest part of software development, and that is mostly…
An outlook into the future of the software development career - in the case of sub-saharan Africa

2022年9月14日

An outlook into the future of the software development career - in the case of sub-saharan Africa

“In the future, every company will become a software company” - Marc Andreessen Enterprise software currently makes up…
Could un/under-employed African graduates help fill the current digital skills gap?

2022年2月4日

Could un/under-employed African graduates help fill the current digital skills gap?

Disclaimer: This is part of an introductory text to a guide book I have just published, targeting mainly an African…

1 条评论
Software Development In Sub-Saharan Africa: Quo Vadis?

2020年8月17日

Software Development In Sub-Saharan Africa: Quo Vadis?

Disclaimer: This article is my personal opinion formed from my personal experiences, and obviously there will be other…

1 条评论
Recursion and Dynamic Programming Approach for Solving Problems

2018年12月15日

Recursion and Dynamic Programming Approach for Solving Problems

If we have access to solutions for smaller instances of a given problem definition, can we construct a required…

1 条评论

See all articles

"Hello World": From the Source Code in an IDE all the way to Machine Code

Kenneth Fukizi

Software engineer, author and tech speaker

01001011

0100 1011

Kenneth Fukizi的更多文章

社区洞察

其他会员也浏览了

My Journey with IDEs

What's new in C# 7.0 through 7.3?

Visual Studio Code(VS Code)User Interface.

Getting Started with C# .NET: Write Your First Program

Understanding C# Series: Delegates are cool and easy! part?1

C# 6.0 Features Overview

Exploring Alternative App Models

How C# 7.1 Is Different?

01001011

0100 1011

Kenneth Fukizi的更多文章

AFRICA REPRESENTING ON GITHUB!

SILICON VALLEY AND AFRICAN-LED BUSINESSES

Design patterns and principles - A high-level view

Architectural patterns - A high-level view

Skills needed to write code - First things first

Before even writing code, what do you need to know?

An outlook into the future of the software development career - in the case of sub-saharan Africa

Could un/under-employed African graduates help fill the current digital skills gap?

Software Development In Sub-Saharan Africa: Quo Vadis?

Recursion and Dynamic Programming Approach for Solving Problems

社区洞察

其他会员也浏览了

My Journey with IDEs

What's new in C# 7.0 through 7.3?

Visual Studio Code(VS Code)User Interface.

Getting Started with C# .NET: Write Your First Program

Understanding C# Series: Delegates are cool and easy! part?1

C# 6.0 Features Overview

Exploring Alternative App Models

How C# 7.1 Is Different?