The Future of Software: Code Generation Step 3 - Object-Oriented Virtual Machines
www.jmramirez.pro

The Future of Software: Code Generation Step 3 - Object-Oriented Virtual Machines

To me, technology comes in waves of excitement.  A wave just like the ones in the ocean a mile from my last house in San Diego.  You can see the wave on the horizon, anticipate it will be a big one that promises a great ride in the surf.  Not every wave delivers unfortunately, but those that do give you a ride you won't forget.  I've rode several technology waves in my career.  I can still remember compiling my first web browser, reading online HTML research papers that others wrote, and authoring my first HTML document. I'm still riding that wave today writing HTML5 and JavaScript.  But of all the waves I've encountered, none beat getting introduced to the Smalltalk language in graduate school.

I'm guessing hardly anyone reading this post has written significant amounts of Smalltalk code.  If you have, let me know!  For those of you who have not had the privilege, I can assure you that you missed out on something very special. Give Squeak a try, you won't be sorry.  I still miss the elegance and power of the language, I don't think there is anything like it.  When Smalltalk started dying and I had to move on to Java, that is the first time in my career I saw us moving backwards as an industry.  I was accustomed to advancing from machine code to Assembly Language, Assembly Language to C, and C to Smalltalk.  At that pace, I was convinced the next step coming in 5 years or so was code that wrote itself.  I invested thousands of hours in code generation research as I was sure that is exactly what would happen.  But unfortunately, we got Java, C#, and Swift instead.  This is not the evolutionary step I imagined.  All these languages are variants of the same 3 core technology improvements made popular by Smalltalk:

  1. Add constructs in the programming language to support object-oriented programming
  2. Combine the advantages of compiled languages and interpreted languages into a new technique called JIT (Just In Time) compilation
  3. Run the resulting program inside a virtual machine runtime which provides safety features for running code, dynamic memory allocation, and automated object garbage collection

Let's briefly discuss each of the 3 substantial improvements above.

Object-Oriented Programming Paradigm
There is a substantial history behind the object-oriented programming paradigm that I won't bore you with.  Let's just focus on the most fundamental mechanism, the ability of an object to encapsulate data as a powerful form of abstraction. The principle behind encapsulation is the user of the object should not know anything about the data hidden inside of the object.  In previous programming paradigms, in order to use code someone else wrote, you needed to understand all the data structures the code utilized in order to use that code.  While that is a bit painful, a much bigger learning curve than it should be, there is a devastating side effect.  When you need to make enhancements to the code you are reusing, which is almost certainly the case, you must also enhance the data structures in that code you are reusing.  Doing so breaks the first set of code that has dependencies on the older version of the data structure.  The old code, the code you are reusing, and your new code all become tightly coupled.  Any change anywhere can ripple throughout the entire code base.

Now I have heard this exact same issue in object-oriented programming and this is listed as the major problem the microservices architecture is supposed to address.  I don't buy that, well-designed object-oriented systems just don't have this issue in my experience.  And I intend to have a lively debate about microservices in a future post.  But for now I have a very simple proof point:

The encapsulation and abstraction concepts in an object paradigm, when followed with some rigor, allow you to design highly reusable object libraries.

You can see the results by doing a simple web search for Java libraries.  There are thousands of Java libraries out there, most of them open source, and most of them pretty easy to use.  The object paradigm did deliver on it's promise to provide a substantial productivity improvement over all the former programming paradigms.

JIT Compilation

I won't go into detail about how JIT compilation works, there are far better resources on the internet for that and it is still a very active area of research and development.  But for our purposes, we just need to understand that compiled programs are just not flexible enough to run the software we want to run.  To change a single line of code in a compiled program, you have to go through the build cycle, create a new binary, stop the old program, put the new binary in place, and restart the program.  The old program could have a significant amount of data cached and arranged nicely into data structures, all of which would need to be wiped out for even a single line of code change.

Interpreters of course don't have this issue.  You can change any line of code any time you want without taking the program down or destroying its cache.  But the first generation style of interpreters are far too slow, requiring each line of code to be parsed before it is executed, every single time.

At the simplest level, JIT Compilers are interpreters that figure out how to run interpreted code at speeds much closer to purely compiled code.  They do this through a variety of innovations that are each very impressive on its own.  And when combined, programs run on modern microprocessors with sufficient memory have a very good level of performance.  And not only that, but they retain their ability to add entirely new applications to the runtime, remove old applications, and patch existing programs, all while they are still running the virtual machine runtime.  The end result of all of this effort is the hope that you can do a rolling patch or upgrade on your code without taking the system down, and without taking a performance hit, so you never have down time or degraded performance.  Truly the future of software!

Virtual Machine Run-time

When most people think about virtual machines, they are thinking about VMware or Amazon AWS virtual machines.  While those types of virtual machines are incredibly useful, that's not what we are talking about here.  A virtual machine runtime is a program that runs inside of an operating system with the purpose of executing programming language instructions.  For those familiar with interpreters, it sounds just like an interpreter, and it certainly is.  A virtual machine run-time is a next generation interpreter that performs JIT Compilation per above (so repetitive code sequences are essentially compiled and optimized instead of interpreted for much better performance).  But it doesn't stop there.

Virtual machine run-times provide an almost seamless ability to allocate memory for object-oriented programming concepts (threads with call stacks, classes, and objects).  You don't need to write code to do this memory allocation, gone are the days of calling malloc in C Language to have the operating system allocate a fixed size number of bytes for you to use.  And even better yet, virtual machine run-times keep track of the number of bytes in each automatically allocated chunk of memory and watch over your code as you use it.  If you exceed the bounds of the memory chunk, the virtual machine generates an exception, telling you the exact line of code that went wrong!  If you do that in C Language, you just start referencing memory used for some other purpose, which causes unpredictable results as your program executes and very late and frustrating nights debugging your code.

Now if that isn't enough, virtual machine run-times almost magically figure out when a chunk of memory you used is not useful any longer and they automatically free it for you.  That feature is called garbage collection and it is a game changer, no more C Language statements calling the operating system free function at exactly the right time with precisely the correct block of memory.  When you mix up the memory blocks and free the wrong memory block, you have another long and frustrating debugging session on your hands.

Most virtual machine run-times are written in C/C++ language, so the people that do all this almost magical work for you are writing the malloc and free calls so you don't have to.  Take a moment to appreciate all their hard work!

Technology Evaluation

As we did in previous posts, let's evaluate the leap in going from C Language to Object-Oriented Virtual Machine languages.  I think the benefits are pretty clearly outlined above.  But what about the side effects?  Just like the move from Assembly Language to C Language, the move from C Language to object languages has considerable side effects as well.

This move carries with it the same side effects we discussed in the last post. Virtual machine runtimes are slower than compiled programs.  They use a considerably larger set of CPU and memory resources.  By definition, they must be slower and consume more resources.  They provide a set of features that make it so much easier to create software and those features require CPU and memory to execute.  Garbage collection typically runs in a background thread and moves objects around in generational heaps for efficiency.  That background work is not required in a compiled language.

The runtime also introduces several instructions in the middle of your code to verify your code is not going out of memory bounds for the object it is accessing. If you do go out of bounds, you get an exception with a stack trace telling you the exact line of code that went wrong.  In a compiled language, a program with that type of error can easily become unreliable and unpredictable with no indication at all where the problem is.  Younger coders take stack traces for granted, but I can tell you, seeing this the first time after debugging memory issues the hard way is something else.  It is a wave of excitement I still remember when my Java runtime tells me the precise line of code I messed up. But nonetheless, this very powerful protection is a side effect.  I didn't code those extra instructions protecting me from my mistakes, I can't control them, and they take additional CPU and memory to execute.

Of all the side effects, the one that tends to bother me the most is the lack of control I have over the runtime.  These runtimes are incredibly complex and the slightest mistake in the runtime can degrade the performance of your software or worse make it unreliable.  You just don't see that in Assembly Language, you have absolute control over every machine instruction.  In C Language, you have a similar side effect as we discussed last post, you don't have control over the generated code and have to trust that those who wrote the compiler did a good job generating your code for you. As object-oriented runtimes are typically built from compiled languages, you have that side effect compounded by all the technology we discussed above added to a compiled language program. These runtimes have 2 orders of magnitude side effects over Assembly Language.

Object-Oriented Virtual Machines in Practice

Now that you are sufficiently warned of the side effects, let's wrap up by discussing my set of practical experience working with object-oriented virtual machine runtimes for quite a few years now.  Smalltalk had some quirks for sure, but the projects I was involved in typically used it for thick clients with minimal concurrency and multi-threading inside the virtual machine.  It wasn't a significant problem for what we used it for.  Java didn't have it so easy though. While some small percentage of projects used Java for thick clients, the user interface technology was just plain dreadful compared to the other technologies out there.  Java didn't catch on because of user interfaces, at least not until Android came into the picture. Java caught on because it became the easiest and most powerful way to create back end applications inside of big and powerful runtimes called application servers.  These application servers became heavily threaded to attempt to compete with enterprise applications run on mainframes.  And the Java runtime responded with quality that I didn't see matched anywhere else.  

I've done a fair bit of .NET programming and even today in my experience it is very far behind Java.  My .NET code just stops working with no reason, no exception, no stack trace.  It is like going back in time to compiled code where I needed to search every line of code for my defect.  In almost all cases it is a defect in my code, but that's not the point.  When I make this type of mistake in Java, Java generates an exception and stack trace making it almost trivial to find my mistake.

I've also been doing some Swift iOS coding and again, in my experience, this is very far behind the curve, much worse than even .NET.  Now I'm not about to go learn Objective C so I'm very thankful Swift exists.  But if I could code Java for my iPhone, I would be a very happy indeed.  When you make a mistake in Swift the app crashes with no ability to catch exceptions. You cannot handle them gracefully, save data, or provide a message to your user. Your app just dies immediately.  I feel for you iOS developers out there!

Overall, I'd highly recommend Java for mission-critical high-throughput applications.  You do need quite a bit of memory and CPU, but in my experience the reliability and stability of the technology is unmatched and you get all the tremendous productivity gains we discussed above.  Burn the memory and CPU and go with Java.

I look forward to you posting your experiences!

   Todd Lauinger

Alan Dewald

Business Owner at Express Carpet Cleaners llc

8 年

Thank you code writers! Brain power likes yours, Mr Lauinger, gives me the ability to text/pic my customers in real time. I am juuust smart enough to see great tech and adapt it to my business!

回复

要查看或添加评论,请登录

Todd Lauinger的更多文章

社区洞察

其他会员也浏览了