Optimizing Performance with Inlining
Rajiv Singh
SDE JL3 @Maersk | Ex @Lummo @redBus @Economize | GSoC '24 '23 Mentor @Jenkins | GSoC '22 @Keptn | LFX '21 @mojal global | GSoD '20 @gRPC-Gateway | HackerRank: 1908.5 | HackerEarth: 1625 | ICPC Regionalist Amritapuri 2020
Inlining is a technique used in programming to optimize the execution speed of a program. It involves replacing a function call with the actual code of the function at the call site. In other words, the compiler or interpreter copies the entire body of the function into the location where the function is called, eliminating the overhead of the function call.
When a function is called, the program typically has to perform certain operations like pushing parameters onto the stack, jumping to the function's code, and then returning to the original location after the function finishes executing. These operations incur a certain amount of overhead, especially for small functions or frequently called functions.
By inlining a function, the compiler or interpreter eliminates the overhead associated with the function call, as the code is directly inserted at the call site. This can result in performance improvements by reducing the number of instructions executed and eliminating the need for stack operations and jumps.
Inlining is usually done by the compiler or optimizer based on certain criteria. For example, functions marked with the?inline?keyword in C or C++ are considered for inlining. However, compilers can make their own decisions based on heuristics, such as the size of the function, frequency of calls, and other optimization goals. The Go compiler generally does a good job of automatically deciding when to inline functions. It leverages escape analysis, function size, and other factors to make informed decisions. In most cases, relying on the compiler's automatic inlining behavior is sufficient and leads to optimized performance.
It's important to note that inlining is not always beneficial. Inlining large or complex functions can increase the size of the code, leading to cache inefficiencies and reduced performance. Additionally, inlining can hinder code modularity and maintainability, as changes to the function's code would require modifying every location where the function is inlined.
Overall, inlining is a trade-off between reducing function call overhead and increasing code size, and its effectiveness depends on the specific context and optimization goals of the program.
When to Use Inlining
When deciding whether to use inlining in your code, it's important to consider the following factors:
In summary, inlining should be used when small, simple functions are frequently called, and profiling indicates a potential performance gain. It's important to strike a balance between performance optimization and code modularity/maintainability. Analyzing the specific characteristics of your code, considering the factors mentioned above, will help you determine when to use or not use inlining effectively.
Inlining in Go
Let's review and improve the post by evaluating the benchmark results of inlining in Go:
In Go, inlining is a process handled by the compiler as part of its optimization strategy. The decision to inline a function is determined by the compiler based on specific criteria and heuristics.
Unlike some other programming languages, Go does not require explicit marking of functions for inlining. Instead, the Go compiler employs a technique called "escape analysis" to determine whether a function should be inlined.
Escape analysis helps the compiler identify the lifetime of objects and their allocation locations. By analyzing object lifetimes, it can determine if an object should be allocated on the stack instead of the heap. This optimization technique improves performance by reducing memory allocations and deallocations.
During escape analysis, the compiler also evaluates factors such as function size, complexity, and frequency of use to determine the benefits of inlining. Typically, smaller and simpler functions are more likely to be considered for inlining compared to larger or more complex ones.
It's important to note that the Go compiler autonomously makes decisions regarding inlining based on its optimization goals. The criteria for inlining may vary across different compiler versions and can also be influenced by compiler flags and settings.
To control or encourage specific inlining behavior in Go, the compiler provides certain options that developers can utilize. For example, the?-gcflags?flag allows developers to specify optimization-related flags during compilation, including options related to inlining.
Read more about function inlining in Go at?https://github.com/golang/go/wiki/CompilerOptimizations#function-inlining.
Let's examine a sample code snippet to demonstrate inlining in Go:
package main
//go:noinline
func AddNonInlined(a, b int) int {
return a + b
}
func AddInlined(a, b int) int {
return a + b
}
The code snippet above includes two functions:?AddNonInlined?and?AddInlined. The?AddNonInlined?function is marked with the?//go:noinline?directive, indicating that it should not be inlined.
To evaluate the impact of inlining, we can use the following benchmark code:
领英推荐
package main
import "testing"
func BenchmarkAddNonInlined(b *testing.B) {
x := 10
y := 5
for i := 0; i < b.N; i++ {
_ = AddNonInlined(x, y)
}
}
func BenchmarkAddInlined(b *testing.B) {
x := 10
y := 5
for i := 0; i < b.N; i++ {
_ = AddInlined(x, y)
}
}
func main() {
testing.Benchmark(BenchmarkAddNonInlined)
testing.Benchmark(BenchmarkAddInlined)
}
In this benchmark code, we define two benchmark functions:?BenchmarkAddNonInlined?and?BenchmarkAddInlined. Each benchmark function performs a specific number of iterations, calling the respective?AddNonInlined?and?AddInlined?functions.
When we run the benchmark code, we obtain the following output:
? go test -bench=.
goos: darwin
goarch: amd64
pkg: testp
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkAddNonInlined-12 970689716 1.234 ns/op
BenchmarkAddInlined-12 1000000000 0.2503 ns/op
PASS
ok testp 1.721s
From the benchmark results, we can observe that the inlined function performs significantly faster than the non-inlined function. The inlined function achieved an average runtime of approximately 0.2503 nanoseconds per operation, while the non-inlined function took approximately 1.234 nanoseconds per operation. This performance difference is attributed to the elimination of function call overhead achieved through inlining.
It's worth noting that benchmark results can vary depending on factors such as hardware, compiler version, and optimization settings. Therefore, it's important to interpret benchmark results within the context of the specific environment and use case.
Overall, inlining in Go can lead to performance improvements by eliminating function call overhead. The Go compiler employs its decision-making process for inlining based on various criteria, and developers can control inlining behavior using compiler options.
When Inlining is not needed
Inlining is a compiler optimization technique that can improve performance by eliminating the overhead of function calls. However, there are situations where inlining may not be necessary or even beneficial. Here are a few scenarios where inlining might not be needed:
In such cases, it is beneficial to allow the compiler to make informed decisions about inlining based on the specific optimization goals and characteristics of the code. It is essential to strike a balance between inlining for performance and maintaining code readability, maintainability, and other software engineering considerations.
Let's examine the previous example and modify it to make it larger and more complex so that we can demonstrate the impact of inlining large or complex functions in Go:
package main
//go:noinline
func AddNonInlined(a, b int) int {
sum := a + b
for i := 0; i < 100000; i++ {
sum += i
}
return sum
}
func AddInlined(a, b int) int {
sum := a + b
for i := 0; i < 100000; i++ {
sum += i
}
return sum
}
In the above code, we have two functions:?AddNonInlined?and?AddInlined. Both functions perform the addition of two integers (a?and?b) and then execute a loop to add numbers from 0 to 99999 to the sum. The functions are identical, except for the?//go:noinline?directive on the?AddNonInlined?function, which explicitly tells the Go compiler not to inline this function.
We can use the same benchmark code to compare the performance of the inlined and non-inlined functions:
package main
import (
"testing"
)
func BenchmarkAddNonInlined(b *testing.B) {
x := 10
y := 5
for i := 0; i < b.N; i++ {
_ = AddNonInlined(x, y)
}
}
func BenchmarkAddInlined(b *testing.B) {
x := 10
y := 5
for i := 0; i < b.N; i++ {
_ = AddInlined(x, y)
}
}
func main() {
testing.Benchmark(BenchmarkAddNonInlined)
testing.Benchmark(BenchmarkAddInlined)
}
When we run the benchmark, we can observe the performance difference between the inlined and non-inlined functions:
? go test -bench=.
goos: darwin
goarch: amd64
pkg: testp
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkAddNonInlined-12 43389 26175 ns/op
BenchmarkAddInlined-12 48938 23937 ns/op
PASS
ok testp 3.297s
From the benchmark results, we can observe that both the?AddNonInlined?and?AddInlined?functions have similar execution times, indicating that inlining the complex loop in this scenario did not significantly improve performance.
The performance difference between the two functions is relatively small, with the?AddInlined?function having a slightly better execution time. Inlining a large or complex function in this case did not provide significant benefits, potentially due to the overhead of code size increase and cache inefficiencies.
This example demonstrates that inlining large or complex functions may not always result in improved performance. It's important to consider factors such as code size, cache efficiency, and the specific characteristics of the function when deciding whether to inline or not. In some cases, allowing the Go compiler to make its own decisions about inlining based on its optimization strategies can be more beneficial.
Conclusion
In conclusion, inlining is a technique used in programming to optimize program execution speed by eliminating the overhead of function calls. It involves replacing function calls with the actual code of the function at the call site. Inlining can result in performance improvements, especially for small, frequently called functions. However, it's important to consider factors such as function size, complexity, code modularity, and maintainability. The decision to inline functions is typically made by the compiler based on specific criteria. It's crucial to strike a balance between performance optimization and other software engineering considerations when deciding to use inlining.
Thank you ?? for taking the time ? to read this blog post ??. I hope you found the information ?? helpful and informative ??. If you have any questions ? or comments ??, please feel free to leave them below ??. Your feedback ?? is always appreciated.