登录查看更多内容

The Future of Software: Code Generation Step 1 - Assembly Language

Todd Lauinger

发布日期: 2016年6月29日

I started my path to software engineering by entering machine code into a toy my mom bought me as a child. The toy had a very simple user interface, a 16 hexadecimal digit keypad and a set of LED display lights. You pick a program from a book, enter all the digits exactly as printed in the book, and the toy would display different patterns on the LED display. The more interesting the program, the longer the machine code sequence, and the more opportunity for error when keying it in. One mistake in the long sequence, forgetting a digit, entering the wrong digit, double entering a digit (which happened a lot on the cheap toy keyboard), and the program won't work. No ability to see what you entered, no ability to backspace or correct the digit. Just clear the memory and start all over from the first machine code digit. Frustrating, yes, and still I was hooked! The book that came with the toy didn't teach you how to write the machine code, but that is what I wanted to do. I wanted to create my own programs to make that display dance to my own tune. And basically that's still what I do today when you think about it. I just have a lot better technology to work with.

So if you will bear with me, I'll walk you through the technology, as quickly as I can, to get you from that machine code toy all the way through to what a vast majority of the software engineers in the world do today, object-oriented programming language code. Step one is the massive productivity leap in going from machine code to assembly language.

I haven't programmed machine code in quite a few years, so I'll have to rely upon someone else to illustrate my point. The material below comes from Prof. Brad Richards so I can't verify the accuracy of the example. However it is the clearest example I could find to show you the productivity leap from machine code to assembly language. Here is his programming example for a "hello world" (or in this case, "abcd") application:

a0 14 01 04 05 a2 14 01 ba 14 01 b4 09 cd 21 b8 00 4c cd 21 61 62 63 64 0d 0a 24

To most people, this won't mean much except for seeing a pattern I have highlighted above, That is the string "abcd" in ASCII hexadecimal. Enter "abcd" into https://www.branah.com/ascii-converter to give it a try, it will indeed give you 61 62 63 64 in hexadecimal. To make more sense of this, we will need to "disassemble" the program. Disassembly is the process of converting machine code into assembly language instructions, a much more human readable format for a machine code program. It looks like this disassembled and with some code comments:

The program starts at machine code address 0100 in hexadecimal. The first instruction loads the contents of address 0114 hex into the a1 CPU register. A CPU register is a powerful storage area, it can be used to do all sorts of manipulation on the data stored within. The more powerful the CPU, the more registers it has and the more complex the instructions you can perform on data in those registers. For example, modern CPUs can do AES encryption on a block of data as a single instruction. By the way, for our analysis below, pay special attention to how many times in this trivial program that the address 0114 is repeated in the machine code.

If you look carefully at the rest of the program, it changes the first character of the string "abcd" to an "f" by adding 5 to it. Then it stores it back into memory and prints the string to the standard output stream. The ultimate output of the program is "fbcd".

Next, let's look at some of the improvements in going from machine code to assembly language by comparing the 2 columns above. There are too many improvements to list here, but I'd like to give you a respect for the people who created this incredible technology (and created it the only way they could, by machine code programming hundreds of thousands of bytes). Let's start with the obvious improvements:

Replace hexadecimal instructions like "a0", "04", and "cd" with human readable instructions like mov, add, and int. Instructions like mov have many different variants depending upon if loading from memory or storing into memory, and specifying a particular register. So the assembly language program needs to know what instruction to match up with the specified register, operands, and order.
Know the length and byte order of each machine code instruction and operand. They have different lengths as back then every byte counted. They didn't waste bytes by padding them to a consistent length.
Translate ASCII data into machine code

Now let's turn our attention to the machine code address 0114. That address is a very important address in this tiny little program, it is used 3 times. In the machine code, it is what we call "hard coded." What happens if we need to delete an instruction in the program? To do it the right way, the cleanest way possible, we remove that instruction from the program. But doing so has the unfortunate side effect that "abcd" moves too. It will not be at address 0114. Hmm, that's not good, I need to recalculate the address and change all the values in the program to the new address. Sounds like too much work to me, I'll just use a "nop" instruction. I'd replace the exact number of bytes in the program for the instructions and operands I want to remove with a set of nop instructions and then the data won't move, problem solved! Well, not quite.

What if instead of removing an instruction, we add an instruction? Well you could try my favorite response to any product manager that wants to make a change to the product: no, the program isn't designed to do that! Unfortunately that never goes over well so I guess we have to move on to figuring out how to add the instruction. The data gets pushed, it will no longer be at address 0114. It will be at an address of 0114 plus the length of the new instruction. If there were no assembly language, you would be forced to update all 3 of the references to 0114 when you add that new instruction. You would need to manually calculate the new address, counting all the bytes from the beginning of the program to the beginning of the data value. And if you blow the calculation, the program doesn't work. You may modify code instead of data. You may modify the wrong character of the string. Or you may modify a different data value entirely. Along with our very first request to modify the program, comes an almost inevitable series of testing and debugging exercises to get the program to not only add the new feature but make sure nothing else is broken in the process.

Fortunately assembly language introduced the concept of labels for data cells. You refer to a label instead of a machine code address and then the assembler replaces the value with the perfectly calculated machine code location of that data cell, all automatically. It also separates code spaces from data spaces, and teaches the CPU the difference between the two, so then you won't corrupt your program by manipulating code instead of data. And there are many other improvements that I won't discuss here. So hopefully you are impressed with this incredible advance in technology that we all pretty much take for granted these days. If not, I have one more thought for you to ponder.

Assembly language introduced virtually no meaningful side effects other than pure productivity improvements. That is the true measure of a new technology in my opinion. Every time I look at a technology advancement, that is the first thing I look for. Along with the proposed improvements, what side effects does it introduce? I've been told quite a few times in my career that I am very good at evaluating technology and determining its merits. And this is my secret sauce, I hope it helps you as well.

Assembly language doesn't have side effects. It generates machine code instructions one-to-one with its source code. If you can write great machine code, you can write great assembly language, and do it a heck of a lot faster. The code executes at the same speed. The code is far easier to understand, easier to debug, easier to change. And it supports a concept called round trip engineering, which we will discuss in more detail later.

So, assembly language looks like the perfect technology, why move on? We will get to that in Code Generation Part 2! For now, bask in the glorious achievement of those that worked so hard to get us assembly language.

要查看或添加评论，请登录

Todd Lauinger的更多文章

The Future of Software: Hard Lessons Learned

2018年8月18日

The Future of Software: Hard Lessons Learned

I've had the distinct privilege in my career to work with some of the best, brightest, and most talented people in the…

6 条评论
The Future of Software: Building a Talented Team

2017年1月18日

The Future of Software: Building a Talented Team

The one thing I am perhaps most thankful of all in my career is the talented people I have worked with. Mentors…

5 条评论
The Future of Software: Code Generation - Future Trends

2016年12月29日

The Future of Software: Code Generation - Future Trends

Finally, the end of this thread! We have beat code generation into a pulp with the exception of the whole premise of…
The Future of Software: Code Generation - The Analysis

2016年12月4日

The Future of Software: Code Generation - The Analysis

Finally on to what I've been promising since the beginning of the series. Here is a quick recap of how we got here: In…

1 条评论
The Future of Software: Code Generation Step 3 - Object-Oriented Virtual Machines

2016年8月19日

The Future of Software: Code Generation Step 3 - Object-Oriented Virtual Machines

To me, technology comes in waves of excitement. A wave just like the ones in the ocean a mile from my last house in San…

2 条评论
The Future of Software: Code Generation Step 2 - Compilers

2016年7月14日

The Future of Software: Code Generation Step 2 - Compilers

I'm guessing a few of you are wondering why I took the time to talk about 2 very old technologies (Assembly Language…
The Future of Software: The 3 T's of Great Software

2016年6月14日

The Future of Software: The 3 T's of Great Software

When I give a presentation at a conference, I love to start off with a bold, controversial, and perhaps unbelievable…

3 条评论
The Future of Software: A Series of Posts

2016年5月27日

The Future of Software: A Series of Posts

I work for Cloud Services in the Enterprise Content Division of EMC. We run a lot of software for our customers.

3 条评论

See all articles

The Future of Software: Code Generation Step 1 - Assembly Language

Todd Lauinger

Todd Lauinger的更多文章

社区洞察

其他会员也浏览了

Pruning your Hyperlambda tree

Understanding Pinning in?Rust

Google Bard Goes Beyond Words: Allows Users to Code and Debug

Harmony, Melody, and Coding with AI

Processor Design #4: Assembly Language

Destructor (Drop Trait) in Rust

A Glossary of Functional Programming

How the philosophical paradigm converges in computer science to shape programming.

A Programmer's Toolkit: Navigating the World of Algorithms

From Assembly to AI: The Journey Towards Making Everyone a Programmer

Todd Lauinger的更多文章

The Future of Software: Hard Lessons Learned

The Future of Software: Building a Talented Team

The Future of Software: Code Generation - Future Trends

The Future of Software: Code Generation - The Analysis

The Future of Software: Code Generation Step 3 - Object-Oriented Virtual Machines

The Future of Software: Code Generation Step 2 - Compilers

The Future of Software: The 3 T's of Great Software

The Future of Software: A Series of Posts

社区洞察

其他会员也浏览了

Pruning your Hyperlambda tree

Understanding Pinning in?Rust

Google Bard Goes Beyond Words: Allows Users to Code and Debug

Harmony, Melody, and Coding with AI

Processor Design #4: Assembly Language

Destructor (Drop Trait) in Rust

A Glossary of Functional Programming

How the philosophical paradigm converges in computer science to shape programming.

A Programmer's Toolkit: Navigating the World of Algorithms

From Assembly to AI: The Journey Towards Making Everyone a Programmer