Go Profiling 0 to 1: A Beginner’s Guide to Analysing Your Code’s Efficiency
Shivam Behl
SWE2 @ Microsoft ?? Flipkart | Zepto | Distributed systems, Machine Learning, Cloud Computing
What is profiling?
Profiling in programming is like taking a detailed measurement of how a program is running. It helps you understand which parts of the program are working hard and which ones are not. By looking at these measurements, you can find ways to make the program faster and more efficient. It’s like getting a health check-up for your code!
But what profiles?
Sure, let’s go through the different types of profiles available in Go profiling:
1. CPU Profiles: These show where your program is spending its CPU cycles, identifying CPU-bound bottlenecks. It’s like having a map that highlights the areas in your code that are taking the most time to execute.
2. Memory Profiles: These provide information about memory allocation and usage patterns. They can help you identify memory leaks or inefficient memory usage. It’s like having an X-ray that shows you where your program is hoarding memory unnecessarily.
3. Blocking Profiles: These profiles show where your program is blocking, waiting for external resources like network, disk I/O, or synchronization primitives like mutexes or channels. It’s like having a traffic report that shows you where your program is getting stuck in traffic.
4. Goroutine Profiles: These give you information about the number of goroutines (lightweight threads) and their call stacks, helping you understand the concurrency behavior of your program. It’s like having a map that shows you all the different paths your program is taking simultaneously.
5. Mutex Profiles: These profiles specifically focus on mutex contention, showing where your program is waiting to acquire mutexes. It’s like having a report that tells you where your program is getting stuck in line, waiting to access a shared resource.
6. Thread Profiles: These show how the goroutines are multiplexed onto the operating system threads, which can help identify potential inefficiencies or bottlenecks in the runtime’s thread management. It’s like having a seating chart that shows you how your program’s simultaneous tasks are being distributed across the available seats (threads).
By analysing these different profiles, you can gain insights into your program’s performance characteristics and identify areas for optimisation or potential issues that need to be addressed.
Ok, so this article in this article we’ll be using runtime/pprofand try to get CPU and Memory profile.
Let’s get Hands?on:
Ok, maybe we should write a complicated program for finding truth of the universe, or fibonacci number:
func fibonacci(n int) int {
if n <= 1 {
return n
}
return fibonacci(n-1) + fibonacci(n-2)
}
A series of numbers where each number is the sum of the two before it. That’s the magical Fibonacci sequence! It starts with 0 and 1, and then each number is the sum of the previous two: 0, 1, 1, 2, 3, 5, 8, 13, and so on, getting bigger with each step!
CPU Profiling
CPU profiling is like using a magnifying glass to find out which parts of your program are eating up the most CPU time. This technique helps you pinpoint the functions that are causing your program to slow down, so you can make them more efficient. Let’s dive into an example of how to do CPU profiling.
package main
import (
"fmt"
"os"
"runtime/pprof"
)
func fibonacci(n int) int {
if n <= 1 {
return n
}
return fibonacci(n-1) + fibonacci(n-2)
}
func main() {
f, err := os.Create("cpu.prof")
if err != nil {
panic(err)
}
defer f.Close()
if err := pprof.StartCPUProfile(f); err != nil {
panic(err)
}
defer pprof.StopCPUProfile()
for i := 0; i < 40; i++ {
fmt.Printf("Fibonacci(%d) = %d\n", i, fibonacci(i))
}
}
Now use, go tool pprof cpu.prof
This will launch an interactive profiling tool where you can use various commands to explore and visualise the CPU profile data.
The first command I used is top. The follwing code snippet shows its use and response:
领英推荐
shivam@Shivams-MacBook-Air ~ % go tool pprof cpu.prof
File: __debug_bin2601311290
Type: cpu
Time: Mar 20, 2024 at 1:53am (IST)
Duration: 1.31s, Total samples = 840ms (63.97%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 840ms, 100% of 840ms total
Showing top 10 nodes out of 21
flat flat% sum% cum cum%
680ms 80.95% 80.95% 680ms 80.95% main.fibonacci
100ms 11.90% 92.86% 100ms 11.90% runtime.pthread_cond_signal
50ms 5.95% 98.81% 50ms 5.95% syscall.syscall
10ms 1.19% 100% 110ms 13.10% runtime.startm
0 0% 100% 50ms 5.95% fmt.Fprintf
0 0% 100% 50ms 5.95% fmt.Printf
0 0% 100% 50ms 5.95% internal/poll.(*FD).Write
0 0% 100% 50ms 5.95% internal/poll.ignoringEINTRIO
0 0% 100% 730ms 86.90% main.main
0 0% 100% 50ms 5.95% os.(*File).Write
Ok, now let’s try to understand it:
On running the top command, I got a table with columns:
?Here’s short and concise explanation:
The output shows the fibonacci function in your main package as the main hotspot, consuming a significant 80.95% of the total sampled CPU time. Optimizing this function or finding an alternative implementation could lead to substantial performance improvements.
The other functions listed, like pthread_cond_signal, syscall, and startm, are likely part of the Go runtime and system calls, which are necessary for the program’s execution but may not be directly optimisable from your application code.
The key takeaway is that the fibonacci function is the primary CPU bottleneck, and addressing its performance should be the priority based on this profiling data.
Some useful commands include:
Now, lets try to visualise and use command: go tool pprof?—?web cpu.prof
Memory Profiling:
Ok, now let’s try memory profiling:
package main
import (
"fmt"
"os"
"runtime/pprof"
)
func fibonacci(n int) int {
if n <= 1 {
return n
}
return fibonacci(n-1) + fibonacci(n-2)
}
func main() {
f, err := os.Create("mem.prof")
if err != nil {
panic(err)
}
defer f.Close()
for i := 0; i < 40; i++ {
result := fibonacci(i)
fmt.Printf("Fibonacci(%d) = %d\n", i, result)
}
if err := pprof.WriteHeapProfile(f); err != nil {
panic(err)
}
runtime.GC()
if err := pprof.WriteHeapProfile(f); err != nil {
panic(err)
}
}
shivam@Shivams-MacBook-Air ~ % go tool pprof mem.prof
File: __debug_bin3376887937
Type: inuse_space
Time: Mar 20, 2024 at 2:24am (IST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 513.12kB, 100% of 513.12kB total
flat flat% sum% cum cum%
513.12kB 100% 100% 513.12kB 100% compress/flate.(*huffmanEncoder).generate
0 0% 100% 513.12kB 100% compress/flate.init.0
0 0% 100% 513.12kB 100% runtime.doInit
0 0% 100% 513.12kB 100% runtime.doInit1
0 0% 100% 513.12kB 100% runtime.main
On Visualising like before:
Best Practices