Just what you need to know about Garbage Collection for OOD/OOP
Google pics

Just what you need to know about Garbage Collection for OOD/OOP

Gone are the days of { Manual | Automatic | Hybrid } Garbage Collection !!! Long live GC ! ! !

Introduction

There was a time when we end-users needed to run de-fragmentation programs on our hard disks, and then came disk encryption, replication, and so on.?Now most of those things are automatically handled.?

In a somewhat analogous way, early software designers had to hand-manage the memory pieces and aggregates, with a need to discipline and count their memory allocations, releases, de-fragmentation and so on. a lot of the memory management was handled within application programs, complicating the resource, performance and deadlock/ livelock management. However, these habits made the designers and architects more aware of the “ground memory” situations. In fact in embedded, C and C++ based systems, efficiency driven systems still follow that today.


With the evolution and refinement of OOD /OOP, gradually more and more of those got automated by the run-time and GC algorithms. After advent of Java, and with increasingly huge available sizes of physical, virtual and heap memories, the focus of worries shifted dramatically away from heaps and collectors, as that became more of an automatic system support in all modern platforms — Java, .NET, Golang, and Python.

Most OOD/OOP paradigms create instances of objects on the heap for both the system itself and the applications hosted by the run-time.?


What is the Need?

So why should you as a designer or architect still need to know about the workings of garbage collectors??The reason is that by relying on automatic GC alone, there may be shocks in the offing.?Here is how.

Unmanaged resources

For a majority of the objects in an application, the garbage collector (GC) cleanups do perform the needed memory tasks automatically. However, certain usages and unmanaged resources often demand explicit cleanup. A common case involves an object that wraps an OS resource, such as a file handle, window handle, or network connection. Although the GC can track the lifetime of a managed object that encapsulates an unmanaged resource, it doesn't have specific knowledge about how to clean up the resource.

See later below for the solution for this.

Customer Shock

After months of fine performance, one poor day suddenly customer applications may stall and eventually ground to a halt, as the over-used heap is being cleaned up with longer pause increments, because some unreleased objects in longer generation levels built up over time in an always-running server. Perhaps, because certain tuning and setting options of the GC subsystem were never explored and utilized.

Unrealized Huge Savings Potential

Modern applications create millions of instances, which need to be checked through the graph checking algorithms of the garbage collector, and then check for proportionally higher severity of fragmentation, compaction and object relocation work for the GC.

It has been calculated by experts and research scientists that a big part of these compute cycles can be avoided, translating into millions of dollars of savings for businesses.

Performance and Capacity Metrics

By analyzing garbage collection behavior, you could find out few targeted micrometrics, like average object creation rate (example: 160 MB/sec), average object reclamation rate.

These can be very useful in achieving effective usage projection and capacity planning for your application.

So in case those reasons are important for your business and applications, the following knowledge nuggets should cover the minimum basics for your design needs.


Heap managing Process ?– ?and the process of Collection

In Java and C#, the JVM/ CLR are the actual processes that manage the heap. We could say this heap process is aware of various objects in the heap, which includes the thread stack frames and all other application objects under execution (these are mostly objects of loaded classes). All applications (and their creational patterns) create various objects as instances of their class, by allocating memory on the heap.

Modern runtimes both JVM and CLR have many similarities in their heap and garbage management.?Application and system designers often learn more than enough about architecture and design patterns, but often precious little about garbage collection and memory management. It is not their fault, as most of the tough work of memory references and pointers from the days of C and C++ applications, has been abstracted / convenienced away from them by modern automatic garbage collecting runtimes of JVM and CLR.?

However, as we will see later in this article, it helps a lot to know about it because there are limits to every GC implementation, and this awareness may be useful to better future proof our application design.?For this we need to understand a few basic terms: GC Root, Mark, Sweep, Collection, Relocation and Graph algorithms.

What is GC Root?

GC ROOT?Is a special Live Object used by the garbage collector; as the starting point for the mark/ sweep/ collector activities. All objects reachable (referenced / used) from a GC root are considered live, and NOT garbage collected.

Here are the main types of GC Roots:

  • Class: Classes loaded by a system class loader; contains references to static variables as well
  • Stack Local: Local variables and parameters to methods stored on the local stack
  • Active Java Threads: All active Java thread objects and their stack frames (i.e. sub-class of Vector).
  • JNI References: Native code Java objects created for JNI calls; contains local variables, parameters to JNI methods, and global JNI references

Additionally, there are a few more possible types of GC Roots, which we will not digress into.

Besides, there is scant documentation per JVM about which specific objects are GC roots. Some of the IDEs may provide?the functionality to analyze memory?from the GC roots perspective. This is beneficial when analyzing?memory leaks?in an application.

Types of GC Collectors

There are two most common and basic functional types of GC.?Both of them treat the network of Reference-connected objects as a graph data structure, which can be visited node-by node. These are not always mutually exclusive, as often they can be combined together in few ways.

Reference counting collector

These are the simpler GC algorithms used in early, like pre-2000 era, garbage collectors due to simple infrastructure needs, but they suffer from unpredictable “Stop-the-World” pauses which cause user-impacting delays on the mutator programs.

This approach, tracks the instances in memory that have at least one reference – these are all Live Objects. For each instance a RefCounter is updated whenever any other object adds or removes a Reference to this instance.

This approach can release memory quickly as soon as it is unreachable, instead of waiting for batch cycles. It is more OOD-elegant but has higher overheads and does not scale well in both time and space complexity.

Some of these could be addressed by what is called “deferred reference-counting”, but lets not digress here.


Tracing collector

Tracing algorithms, such as mark-and-sweep, mark-and-compact, and mark-and-copy, work by traversing the graph of reachable objects from a set of roots and marking them as alive, while freeing the unmarked objects.

Tracing collector starts at the GC roots, then marks and classifies all the nodes either as Black (e.g. the Root nodes in beginning) and all the rest as White. It then traverses the object graph recursively.

As it visits Nodes that are found reachable, it may move them into a target TO-Location (from the FROM-Location), with some variation of early or late. The purpose being to reduce the fragmentation, while also compacting longer-lived (older “generation”) objects first, followed by mid-age generation and finally the short-life objects.

It checks node-to-node reachability by finding graphs of reachable nodes from the initial Black nodes, and separating the “island graphs” which are not connected to roots as White Nodes.

Those are unreachable nodes that can be recycled i.e. collected.

Modern tracing GC algorithms are much more flexible and can be tuned much more readily than reference-counting garbage collectors, but they come with the downside of requiring fairly complex infrastructure from a programming language runtime and a compiler.

HubSpot JVM is a popular one from Oracle.?All GC implementations in the HotSpot JVM are tracing collectors.

Others

Other popular algorithms are copying, and generational.

Copying algorithms, work by dividing the memory into two regions and copying the reachable objects from one region to another, while discarding the unreachable objects.

Generational algorithms, such as the Java HotSpot virtual machine, work by segregating the objects into different generations based on their age and applying different garbage collection strategies to each generation.?This leverages the observation that a majority of (mostly smaller) objects have very short lifetimes and can be quickly collected in short incremental sweeps (or picks).


Implementation Styles

Another way to classify GCs is by their implementation style:

a)??Stop-the-World (STW) big-shot Collection, which examine the entire heap when it performs a collection pass. The GC run stops the main CPU thread.and all mutators for the duration of clean up. This usually causes major GC spikes affecting the performance of your application.?

b)??Incremental: A single collection is divided into multiple increments whose executions are interleaved with the application on a single processor. Typically, the rate of collection is related to (and greater than) the allocation rate so that the collection is guaranteed to terminate.

c)??Concurrent: At least one program thread and one collector thread are executing concurrently.

d)??Parallel: Multiple collector threads are collecting concurrently.


TRIGGER EVENTS

There are three major types of events that trigger garbage collection in the heap.

·???????Minor events:?These occur when the young (Eden) space is full and objects are moved to a Survivor. A minor event happens within the young area.

·???????Mixed events:?These are minor events that reclaim old generation objects.

·???????Major events:?These clear space in both the young and old generations, which takes longer than other types of garbage collection events.


So What did we do for Unmanaged Exceptions?

So in this case, the designer may need to manually handle the memory release step in the older style. For this purpose MS DotNet has provided the public?Dispose()?method. This enables users of your object to explicitly release it when they're finished with it. When you use an object that encapsulates an unmanaged resource, make sure to call?Dispose?as necessary.

You can either use a safe handle to wrap the unmanaged resource, or override the?Object.Finalize()?method.

Future of GC

Future needs of advanced GC systems may include more parallel processing and real-time features for specialized niche systems and applications.

要查看或添加评论,请登录

Susheel J.的更多文章

社区洞察

其他会员也浏览了