Why is Go fast?
Why is Go fast?
Go has become popular for microprocesses & for scaling. What are the design decisions that make Go fast?
Summary:
1. Clever usage of the stack in preference to the heap
2. Lightweight Goroutines in a process avoiding OS calls to switch between threads & processes
Details:
1. The Stack
In the history of computer science, the fastest way to read memory is sequential. The stack is a consecutive memory block with fast & simple allocation/deallocation by moving stack pointers & using stack frames. The heap memory is used by pointers to chunks of memory. https://www.ardanlabs.com/blog/2017/05/language-mechanics-on-stacks-and-pointers.html
The heap memory is hard to manage. They have to be either managed by hand as in C/C++ or by a garbage collector that tracks data that is no longer pointed to. Garbage collectors are designed for high throughput (finding the most unused memory in a scan) or the more popular low latency (scan quickly). Go uses a low latency garbage collector. https://research.google/pubs/pub40801/
Sometimes, depending on the code, when the data can't be stored on the stack, the compiler will move the data to a heap, which is called 'escape analysis'. This can get complex, since if data that needs to be moved remains on the stack, this can cause memory corruption. The better the escape analysis, more data can remain on the stack & less data will need to move to the heap, which can improve performance. https://segment.com/blog/allocation-efficiency-in-high-performance-go-services/
Even though RAM is "Random Access memory", the fastest way to read memory is still sequential. Accessing random heap data via pointers even in RAM is two orders of magnitude slower than sequential stack data. https://www.forrestthewoods.com/blog/memory-bandwidth-napkin-math/
In Java, objects are stored on heaps; only its pointer is on the stack. Lists in Java, though they look linear & sequential, are stored as an array of pointers on the stack. The actual data is in the heap. Python, Ruby & Javascript have similar behaviors. Java has clever complex tunable garbage collectors, using both high throughput & low latency GCs in the JVM. The VMs for Python, Ruby & Javascript are less optimized than Java's.
In Go, Structs & primitive types use the Stack. Pointers are discouraged. The Garbage collector is optimized for low latency to return quickly. The design decision of favoring & encouraging the stack results in better performance.
2. Concurrency: Processes vs Threads vs Goroutines
领英推荐
Concurrency has to be used for the right use-cases. Concurrency is not parallelism & can increase code complexity. Amdahl's Law provides a formula to determine if concurrency is useful dependng on the nature of the sequential vs parallel work. Concurrency is best used for slow work such as I/O or network calls rather than most in-memory work. https://www.oreilly.com/library/view/the-art-of/9780596802424/
Traditionally, languages provide concurrency creating threads within a process through OS calls with locking to share data or through multiple processes. The OS schedules threads within a process on a CPU core.
A Go program creates multiple threads in a Go process on launch, paying the cost upfront, with its own scheduler. Go provides Goroutines, which can be thought of as lightweight processes or threads managed by Go rather than the OS. Creating a Goroutine & switching between Goroutines are quick since they happen within the Go process & don't make an OS call.
Go's scheduler is part of the Go process, making it quick compared to the OS scheduler, automatically balances the workload across the threads & works with the GC. The scheduler is optimized; for example, it can unschedule a goroutine when it is blocking on I/O. https://youtu.be/YHRO5WQGh0k
Goroutine stack sizes are smaller than OS thread stack sizes by default, consuming less memory by default (with an ability to grow as needed, paying a performance penalty at that time). Consequently, Go programs can spawn tens of thousands of simultaneous Goroutines, while a similar approach with native OS threading in other languages will slow to a crawl.
Go uses the CSP (Communicating Sequential Processes) concurrency model by default, using unbuffered & buffered channels to communicate data. This pattern enhances code clarity but is not a performance improvement; the performance improvement comes from the earlier design decisions.. Mutexes, locks & atomics are available, if needed.
3. Compiler
Compiler builds are large projects can be time consuming. The Go compiler does not support circular dependencies when building code. This adds development burden to organize the packages but results in quicker builds, compared to competing compilers that support circular dependencies.
Conclusion:
Some simple design decisions have made Go a performant language, resulting in high adoption & usage for scalable applications, despite its relative younger age.
Reference: The excellent book "Learning Go, an idiomatic approach to Real-World Go Programming" by Jon Bodner