登录查看更多内容

How AI/ML Combines with eBPF to Help Troubleshoot, Secure, Monitor Linux Networking

Raj Sahu

发布日期: 2024年2月27日

At the recent Linux Plumbers Conference, there were at least 30 talks on eBPF (extended Berkeley Packet Filter), and its popularity has been consistently increasing for the past few years. It has quickly become not just an invaluable technology but also an in-demand skill. This blog is an attempt to share my understanding of eBPF, perspectives, and predictions on how this technology may evolve. I hope this blog inspires you to take a closer look at this technology and develop an appreciation for it. Special kudos to Isovalent 's Liz Rice for her hands-free and easy-to-understand workshop[1].

My journey into operating system troubleshooting began fresh out of college when I landed a role with IBM, joining the celebrated TCP/IP team tasked with developing the next version of OS/2, named Aurora. Grateful for the opportunity, I eagerly absorbed knowledge, devouring TCP/IP Illustrated books alongside my tasks. My fascination was sparked by tcpdump, our sole tool at that time (probably dating myself :-)). As seen in Figure 1 below, packets can vanish at various points not tracked by tcpdump. eBPF comes to the rescue here, providing packet flow insight, enhancing security, and adding observability to network infrastructure.

Figure1. Troubleshooting beyond tcpdump (Photo Credit - Martynas Pumputis amp; Aditi Ghag)

Before we delve into interesting use cases, let's refresh our understanding of eBPF.

What Is BPF/eBPF, and Why Is It Important?

BPF(Berkeley Packet Filter) is a small virtual machine that can run programs injected from the userspace in kernel space without changing/recompiling the kernel code, it was 1st implemented in Linux kernel 3.15(1992), was better known as packet filter language for tcpdump.

BPF evolved to what we call “extended BPF” or “eBPF” starting in kernel version 3.18 in 2014.

eBPF is a revolutionary kernel technology that allows developers to write custom code that can be loaded into the kernel dynamically, changing the way the kernel behaves.

This enables a new generation of highly performant networking, observability, and security tools. And as you’ll see, if you want to instrument an app with these eBPF-based tools, you don’t need to modify or reconfigure the app in any way, thanks to eBPF’s vantage point -powerful and privileged position within the kernel, as shown in Figure2.

Just a few of the things you can do with eBPF include:

Performance tracing of pretty much any aspect of a system
High-performance networking, with built-in visibility
Detecting and (optionally) preventing malicious activity

Figure2. eBPF program is attached to events in kernel (Photo: Learning eBPF Book)

eBPF programs are event-driven and are run when the kernel or an application passes a certain hook point. Pre-defined hooks include system calls, function entry/exit, kernel tracepoints, network events, and several others, as shown in Figure3.

Figure3. eBPF with predefined hooks (like system call) (Photo Credit:

If a predefined hook does not exist for a particular need, it is possible to create a kernel probe (kprobe) or user probe (uprobe) to attach eBPF programs almost anywhere in kernel or user applications, as shown in Figure4.

Figure4. eBPF with user probe and kernel probe (Photo Credit:

How are eBPF programs written?

In a lot of scenarios, eBPF is not used directly but indirectly via projects like Cilium, bcc, or bpftrace which provide an abstraction on top of eBPF and do not require writing programs directly but instead offer the ability to specify intent-based definitions which are then implemented with eBPF.

Figure5. Low Level Virtual Machine (LLVM) with Clang (compiler frontend) creates eBPF bytecode (Photo Credit:

If no higher-level abstraction exists, programs need to be written directly. The Linux kernel expects eBPF programs to be loaded in the form of bytecode. While it is of course possible to write bytecode directly, the more common development practice is to leverage a compiler suite like Low Level Virtual Machine(LLVM) to compile pseudo-C code into eBPF bytecode.

Maps

eBPF programs use eBPF maps to store, share, and retrieve data across kernel and user space, enabling state storage and information sharing

The following is an incomplete list of supported map types to give an understanding of the diversity in data structures. For various map types, both a shared and a per-CPU variation is available.

Hash tables, Arrays
LRU (Least Recently Used)
Ring Buffer
Stack Trace
LPM (Longest Prefix match)

In summary, eBPF program allow safe & efficient access into kernel operation by:

Providing built-in hooks for programs based on system calls, kernel functions, network events and other triggers
Providing a mechanism for compiling and verifying code prior to running, which helps ensure security and stability of the system
Offering a more straightforward way to enhance kernel functionality than is possible through LKMs (Linux Kernel Modules), thereby allowing even small teams to efficiently develop safe programs that run in kernel space

eBPF for Networking

Figure8. Bypassing iptables and conntrack processing with eBPF (Photo Credit -

You can see from Figure that ingress packet destined for an application has to travel thru network stack on the host and again on pod network stack, adding eBPF avoid such duplicate traversal.

Figure9. eBPF based XDP(express Data Path) simplifies networking when compared to Kernel Bypass

eBPF based XDP (express Data Path) offers high performance packet processing within the Linux Kernel ideal for tasks like Distributed denial of service (DDoS) mitigation and packet monitoring Kernel bypass techniques, such as Data Plane Development Kit (DPDK), aim to achieve even lower latency and higher throughput by circumventing the kernel entirely. Both approaches have their strengths and trade-offs, with XDP providing kernel-based efficiency and compatibility, while kernel bypass offers ultra-low latency at the expense of increase complexity in user space applications.

Now that we know about eBPF, let's explore how and where it is currently leveraged under the hood for network troubleshooting, security, and observability.

eBPF in Kubernetes

eBPF represents a more modern and capable approach, addressing many of iptables' inherent limitations. Currently Kubernetes uses iptables for

kube-proxy: the component which implements services and load balancing by DNAT iptables rules
CNI (Container Network Interface) plugins

iptables is widely supported and is the default operating model for a new Kubernetes cluster. Unfortunately, it runs into a few problems:

iptables updates are made by recreating and updating all rules in a single transaction.
iptables is implemented as a chain of rules in a linked list, so all operations are O(n).
iptables implements access control as a sequential list of rules (also O(n)).
Every time you have a new IP or port to match, rules need to be added and the chain changed.
Has high consumption of resources on Kubernetes.
The shift from iptables to eBPF offers tangible benefits: improved application performance, simplified network operations, and enhanced security, leading to cost savings and better resource utilization, as shown in Figure 10.

Figure10. eBPF reducing time complexity for search/insert/delete to O(1)

Under heavy traffic or frequent changes, iptables causes unpredictable performance degradation due to its sequential rule evaluation and the need for consistent rule updates, leading to significant penalties at scale. For instance, updating iptables rules for a 20,000-service cluster could take up to five hours, as found by Huawei.

As shown in Figure 10, after replacing iptables with eBPF in Kubernetes networking, performance tests for throughput, CPU usage, and latency indicate that eBPF scales effectively, even with 1 million rules. In contrast, iptables does not scale as well, showing a considerable performance hit with even a low number of rules, such as 1k or 10k, compared to eBPF.

eBPF for Security

The difference between a security tool and an observability tool that reports on events is that a security tool needs to be able to distinguish between events that are expected under normal circumstances and events that suggest malicious activity might be taking place. Policies have to take into account not just normal behavior when systems are fully functional, but also the expected error path behavior.

A security tool compares activity to a policy and takes some action when the activity is outside the policy, making it suspicious. That action would typically involve generating a security event log, which would usually get sent to a Security Information Event Management (SIEM) platform. It might also result in an alert to a human who will be called on to investigate what happened.

领英推荐

How to Break Free From Your Computer Operating System…

Christopher Elliott 8 年前

The Ultimate Guide to Network Commands for Endpoint…

Murtuza Lokhandwala 7 个月前

Self-Hosting RustDesk on Ubuntu Server for Secure…

Hirenkumar G. 1 个月前

When an eBPF program is triggered at the entry point to a system call, it can access the arguments that user space has passed to that system call. If those arguments are pointers, the kernel will need to copy the pointed-to data into its own data structures before acting on that data. As illustrated in Figure11, there is a window of opportunity for an attacker to modify this data, after it has been inspected by the eBPF program but before the kernel copies it. Thus, the data being acted on might not be the same as what was captured by the eBPF program

The Linux Security Module (LSM) interface provides a set of hooks that each occur just before the kernel is about to act on a kernel data structure. The function called by a hook can make a decision about whether to allow the action to go ahead.

Figure11. eBPF securing kernel (Photo: Learning eBPF Book

Firewalling and DDoS protection are a natural fit for eBPF programs attached early in the ingress path for network packets. And with the possibility of XDP programs offloaded to hardware, malicious packets may never even reach the CPU

For implementing more sophisticated network policies, such as Kubernetes policies determining which services are allowed to communicate with one another, eBPF programs that attach to points in the network stack can drop packets if they are determined to be out of policy.

eBPF’s use in security has evolved from low-level checks on system calls to much more sophisticated use of eBPF programs for security policy checks, in-kernel event filtering, and runtime enforcement

eBPF for Network Observability

eBPF provides an interesting tool that allows us to collect data that is otherwise not available in /proc or other static system representations.

Cilium is an open source project that has been designed on top of eBPF replacing need for iptables, to address the networking, security, and visibility requirements of container workloads.

Comprehensive connectivity observability requires insight across all layers, not just from a single layer or solely the application (limited to the L7 layer). Cilium, powered by eBPF, enables this, as shown in Figure 12.

Prometheus is a time series database owned by the Cloud Native Computing Foundation. Prometheus’s third party integrations are called “exporters”, which allow tools like Graphana to plot various metrics in various format.

Grafana is an open-source data analytics and visualization web application created by Grafana Labs. It lets you visualize time series data by compiling them into charts, graphs, or maps and it even provides alerting when connected to supported data sources.

The combination of Prometheus and Grafana Agent gives you control over the metrics you want to report, where they come from, and where they’re going. Once the data is in Grafana, it can be stored in a Grafana Mimir database. Grafana dashboards consist of visualizations populated by data queried from the Prometheus data source. The PromQL query filters and aggregates the data to provide you the insight you need. With those steps, we’ve gone from raw numbers, generated by software, into Prometheus, delivered to Grafana, queried by PromQL, and visualized by Grafana as shown in Figure 13.

Figure13. Raw data from Linux internal to Visual insight with Graphana Dashboard (Photo:

eBPF successful usecases

Netflix uses eBPF at scale for network insights
Apple uses eBPF through Falco for kernel security monitoring
Ikea uses eBPF through Cilium for networking and load balancing in their private cloud
Walmart uses eBPF for edge cloud load balancing
Cruise uses eBPF to monitor GPU performance
Sysdig uses eBPF to enable high-performance system call tracing, facilitate container-aware troubleshooting, conduct security auditing, and provide rich insights and data from the kernel
Meta uses eBPF to process and load balance every packet coming into their data centers (refer project Katran)
Bell Canada uses eBPF to improve Telco Networking with Segment Routing (SR)

Enhanced eBPF Capabilities with AI/ML Integration

Now that we understand eBPF, let's explore futuristic use cases where eBPF combines with the superpower of AI/ML.

AI-Powered Network Security: eBPF can be used to monitor network traffic in real-time, allowing AI models to analyze patterns and detect anomalies or potential security threats, such as DDoS attacks or network intrusions. By providing detailed insights into packet flows and system calls, eBPF enables AI systems to make informed decisions on blocking malicious activities or alerting administrators.
ML powered Performance Monitoring and Optimization: eBPF can collect detailed performance metrics from applications and the kernel, which can be fed into ML models to predict potential bottlenecks or failures. These insights can help in auto-tuning system parameters for optimal performance or in dynamically adjusting resources allocation to improve efficiency and reduce latency.
AI Model Training Observability: Training AI models can be resource-intensive and time-consuming. eBPF can be used to observe and collect metrics on resource usage (CPU, memory, I/O) at a granular level during the training process. This data can help identify inefficiencies and optimize the training process, potentially reducing the time and resources required.
AI in Fraud Detection: In financial services, eBPF can be used to monitor and log transactions in real-time. AI models can analyze this data to detect unusual patterns indicative of fraud. This approach allows for immediate detection and mitigation actions, significantly reducing the risk and impact of fraudulent activities.
Predictive Maintenance Using ML: In IoT and industrial contexts, eBPF can collect data from various sensors and devices, which can be analyzed by ML models to predict equipment failures before they occur. This predictive maintenance can save costs and prevent downtime by scheduling repairs or replacements in advance.
AI-Based Dynamic Load Balancing: For cloud services and distributed systems, eBPF can monitor traffic and system metrics, enabling AI algorithms to dynamically adjust load balancing strategies. This can ensure optimal resource utilization and improve user experience by reducing response times and avoiding bottlenecks.
AI/ML-Powered Telemetry Root Cause Analysis: eBPF can provide detailed telemetry about system behavior and application performance. AI/ML models can analyze this data to automate root cause analysis of system issues, reducing the time needed to diagnose and resolve problems.

To Summarise, here are BCC (BPF Compiler Collection) & BPF tracing tools

Raj Sahu的更多文章

The Neuroscience of Mistakes in SRE: Why Stress, Not Process, Causes Catastrophic Failures—and How to Fix It

2025年3月17日

The Neuroscience of Mistakes in SRE: Why Stress, Not Process, Causes Catastrophic Failures—and How to Fix It

When One Mistake Brings the World to a Halt In July 2024, a single faulty update from CrowdStrike grounded flights…

56 条评论
How to Secure Kubernetes Containers in Production with AI/ML

2023年7月20日

How to Secure Kubernetes Containers in Production with AI/ML

The internet is brimming with cybersecurity attacks every day, to the extent that we have become numb to these…

36 条评论
Synergy Between Incremental Improvement and Radical Innovation?

2023年5月26日

Synergy Between Incremental Improvement and Radical Innovation?

I was taken aback by the quote "The electric light did not come about from the continuous improvement of the candle" by…

1 条评论
Secured DevSecOps way to stay competitive after big tech layoffs

2023年1月23日

Secured DevSecOps way to stay competitive after big tech layoffs

According to data compiled by the Layoffs.fyi website, more than 55,000 global technology sector employees have been…
AI/ML at works behind the best FIFA2022

2022年12月26日

AI/ML at works behind the best FIFA2022

Last Sunday the world saw the best Worldcup football final (yes Soccer). In this article, I would set the context on…
Introducing Non-Abstract Large System Design(NALSD)

2022年12月26日

Introducing Non-Abstract Large System Design(NALSD)

NALSD is required to deploy scalable fault-tolerant software solutions in production. With NALSD we exactly know how…
Season of Giving - Happy Holidays 2022!

2022年12月22日

Season of Giving - Happy Holidays 2022!

In the spirit of giving - Give/Ask Guidance NOT Feedback. Why? Because Feedback is associated with evaluation.

1 条评论
Kubernetes and TCP/IP Networking

2021年8月2日

Kubernetes and TCP/IP Networking

By: Raj Sahu Kubernetes (k8s) is undoubtedly the most decorated cloud infrastructure. But, how can Networking/or…

12 条评论

See all articles

How AI/ML Combines with eBPF to Help Troubleshoot, Secure, Monitor Linux Networking

Raj Sahu

What Is BPF/eBPF, and Why Is It Important?

How are eBPF programs written?

Maps

eBPF for Networking

eBPF in Kubernetes

eBPF for Security

领英推荐

eBPF for Network Observability

eBPF successful usecases

Enhanced eBPF Capabilities with AI/ML Integration

To Summarise, here are BCC (BPF Compiler Collection) & BPF tracing tools

Further Reading

Raj Sahu的更多文章

社区洞察

其他会员也浏览了

The Core | December 2024 Edition

Ultimate Guide: Installing Kali Linux 2024 on an External Drive with VMware 17 – Your Gateway to Portable Cybersecurity

Step by Step Tutorial to Set Up VNC on Raspberry Pi

12 Ways K-12 CIOs Use CloudReady

The 5 Advantages of Using Personalized Linux on Embedded Systems

Comptia a+ Certification training Cost

Linux Turns 30! Endpoints around the World are Celebrating Today!

Kali Linux 2024.3 Release: What’s New?

Reinvent Computing with a Penguin!

IPv6 Success Stories: Advice, Community Questions, and Visions for the Future

What Is BPF/eBPF, and Why Is It Important?

How are eBPF programs written?

Maps

eBPF for Networking

eBPF in Kubernetes

eBPF for Security

领英推荐

eBPF for Network Observability

eBPF successful usecases

Enhanced eBPF Capabilities with AI/ML Integration

To Summarise, here are BCC (BPF Compiler Collection) & BPF tracing tools

Further Reading

Raj Sahu的更多文章

The Neuroscience of Mistakes in SRE: Why Stress, Not Process, Causes Catastrophic Failures—and How to Fix It

How to Secure Kubernetes Containers in Production with AI/ML

Synergy Between Incremental Improvement and Radical Innovation?

Secured DevSecOps way to stay competitive after big tech layoffs

AI/ML at works behind the best FIFA2022

Introducing Non-Abstract Large System Design(NALSD)

Season of Giving - Happy Holidays 2022!

Kubernetes and TCP/IP Networking

社区洞察

其他会员也浏览了

The Core | December 2024 Edition

Ultimate Guide: Installing Kali Linux 2024 on an External Drive with VMware 17 – Your Gateway to Portable Cybersecurity

Step by Step Tutorial to Set Up VNC on Raspberry Pi

12 Ways K-12 CIOs Use CloudReady

The 5 Advantages of Using Personalized Linux on Embedded Systems

Comptia a+ Certification training Cost

Linux Turns 30! Endpoints around the World are Celebrating Today!

Kali Linux 2024.3 Release: What’s New?

Reinvent Computing with a Penguin!

IPv6 Success Stories: Advice, Community Questions, and Visions for the Future