Debugging and Tracing in Linux: From Kernel to User Space
Debugging and tracing are critical skills for developers working on low-level systems, device drivers, or performance-sensitive applications. In Linux, tools and APIs exist for both kernel/driver development and user-space programming, each tailored to their respective environments. This article introduces essential debugging techniques, with concrete examples for tools like printk(), dev_dbg(), printf(), gdb, ftrace, and more.
Normally we often use printk() for kernel and printf() for user space, others are included to provide an overview.
1. Kernel and Driver Development
1.1 Logging with printk()
The printk() function is the kernel’s equivalent of printf(). It supports log levels (e.g., KERN_INFO, KERN_ERR) to categorize messages, which can be viewed using dmesg.
Example: Logging in a Kernel Module
#include <linux/init.h>
#include <linux/module.h>
static int __init my_module_init(void) {
printk(KERN_INFO "my_module: Initialized\n");
return 0;
}
static void __exit my_module_exit(void) {
printk(KERN_INFO "my_module: Exited\n");
}
module_init(my_module_init);
module_exit(my_module_exit);
View logs with:
dmesg | grep "my_module"
1.2 Device-Specific Logging: dev_dbg() and dev_err()
The dev_*() family of functions (e.g., dev_dbg(), dev_err()) include device context (e.g., PCI address) in logs, making them ideal for driver code. Unlike printk(), dev_dbg() messages are dynamically enabled at runtime, reducing overhead when debugging is off.
Example: Using dev_dbg() in a Driver
void probe(struct device *dev) {
dev_dbg(dev, "Probing device\n"); // Debug message (disabled by default)
if (error)
dev_err(dev, "Probe failed: %d\n", error); // Always printed
}
Enabling Dynamic Debugging
To activate dev_dbg() messages for a specific driver (e.g., my_driver.c):
echo 'file my_driver.c +p' > /sys/kernel/debug/dynamic_debug/control
How It Works
Behind the Scenes The kernel uses macros like dev_dbg() or pr_debug() to mark debug statements. These are compiled into the kernel but remain inactive until explicitly enabled. For example:
// Kernel source snippet using dev_dbg()
dev_dbg(dev, "Initializing DMA buffer at %p\n", buffer);
When dynamic debugging is enabled for my_driver.c, this message is printed with device context:
[ 12.345] my_driver 0000:01:00.0: Initializing DMA buffer at 0xffff8a0001a2f000
Advanced Usage
Why This Matters
Dynamic debugging avoids recompiling the kernel or module for minor debugging tasks. It’s invaluable for diagnosing issues in production systems where rebooting is costly. Combined with dmesg -wH (to monitor logs in real-time), developers can iteratively refine debug output without disrupting system operation.
2. User-Space Debugging
2.1 printf() Debugging
The classic printf() (or fprintf(stderr, ...)) is useful for quick checks.
Example: Debugging a Memory Leak
void process_data() {
void *ptr = malloc(1024);
printf("Allocated memory at %p\n", ptr); // Track allocations
free(ptr);
}
2.2 GNU Debugger (gdb)
gdb inspects running processes, sets breakpoints, and analyzes crashes.
Example: Debugging a Segmentation Fault Compile with -g, then run:
gcc -g -o my_program my_program.c
gdb ./my_program
In gdb:
(gdb) break main # Set breakpoint at main()
(gdb) run # Start execution
(gdb) next # Step to next line
(gdb) print ptr # Inspect variable
2.3 System Call Tracing with strace
strace traces system calls made by a process.
Example: Tracing File Operations
strace -e openat,read,close ls /tmp
Output:
openat(AT_FDCWD, "/tmp", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
read(3, "file1\nfile2\n"..., 32768) = 24
close(3)
3. Advanced Tracing Tools
3.1 ftrace
ftrace is a kernel-built-in tracer for analyzing latency and function calls.
Example: Tracing Function Execution
cd /sys/kernel/tracing
echo function > current_tracer
echo devm_kmalloc > set_ftrace_filter
echo 1 > tracing_on
# Run workload...
echo 0 > tracing_on
cat trace
Output:
# tracer: function
# TASK-PID CPU# TIMESTAMP FUNCTION
my_program-1234 [001] 456.789: devm_kmalloc <-device_probe
3.2 perf
perf profiles CPU performance, including hardware counters.
Example: Profiling CPU Usage Count CPU events (e.g., cache misses):
perf stat -e cache-misses,instructions ./my_program
Generate a flame graph:
perf record -g ./my_program # Record call stack
perf script > out.stack
./FlameGraph/stackcollapse-perf.pl out.stack | ./FlameGraph/flamegraph.pl > graph.svg
4. Choosing the Right Tool
5. Conclusion
Debugging in Linux spans multiple layers:
By mastering these tools, developers can efficiently diagnose issues from driver misbehavior to user-space performance bottlenecks.
Senior Engineer @Sasken Technologies | C Language | Linux System Programming | Linux Device Drivers | Linux Kernel | Validation | AOSP |Android
4 天前Very informative
Sr. Technical Lead | Automotive C++ | Telematics | Infotainment | Ethernet, CAN, LIN | System Design | Virtualization | dSpace | Embedded Systems | Data Structures & Algorithms | AI/ML | Vehicle Networks | AWS Cloud
4 天前Great post! Another powerful tool worth mentioning is **SystemTap**. It allows you to write scripts to monitor and trace the activities of a running Linux system, providing a deeper understanding of both kernel and user-space behavior. Additionally, **BPF (Berkeley Packet Filter)** has evolved into a robust framework for performance analysis and security monitoring. Leveraging these tools can provide a more comprehensive debugging strategy. Keep exploring and happy debugging! ??