Cilium & Argo CD on a Single-Node Kubernetes Cluster on Your Laptop - A Love Story of eBPF and GitOps
After setting up a functional Kubernetes cluster in Blog 1, the cluster is just at the starting point. While it could handle basic tasks like port-forwarding and NodePort for quick experiments (the "Hello World" phase of Kubernetes), it wasn’t ready for the real-world challenges of external access, scalability, and security. The absence of critical features like Ingress, Load Balancing, and IP Address Management meant it was time to level up. This means the cluster is still too amateur-ish!
?
I am very used to MetalLB and Traefik for networking features. However, my blog journey needed something fresh, and that's when I discovered Cilium could cover most of my needs. It's like finding out your Swiss Army knife can also make espresso - suddenly, everything else seems redundant!
?
So, to make the cluster truly production-ready (no, lab-ready I meant - let's not get too ambitious here), I'm equipping it with Cilium's advanced networking capabilities and Argo CD for seamless GitOps-style application deployment. Think of it as giving your Kubernetes cluster both a brain upgrade and a fancy new suit.
Why Cilium? Because Multiple Tools Are So 2023
Cilium has evolved from "just another CNI" to "the CNI that ate all other CNIs" (in a good way). It's setting its sights on being "the only stack" for Kubernetes networking, which is either brilliantly ambitious or slightly megalomaniacal - you decide. Over the years, it has evolved to address key areas that would typically require multiple tools:
?
In this blog, I'll explore how Cilium can complete our cluster by enabling:
We'll configure Cilium to activate these features, verify its functionality, deploy Argo CD for automated application management, and conclude by deploying a demo app that's actually interesting (sorry, Bookinfo demo, but it's time to move on).
?
?
Prerequisites
Before you dive in, make sure you have a similar environment. If you followed Blog 1 (which isn't mandatory, but hey, it helps), your environment would look like this:
Kubernetes Cluster Setup:
Container Runtime:
Networking:
Cluster Configuration:
System Preparations:
Cluster Limitations:
Additional Tools:
Remember: A working basic cluster is crucial before adding more complexity. If something’s not right at this stage, it’ll only get more interesting (read: problematic) later!
Overview of Tasks
The setup procedure is still straightforward like in Blog 1(if you believe that…) and involves the following tasks:
Task 1: Delete Kube Proxy and Upgrade/Install Cilium with Additional Parameters
Understanding the Why
Before we start breaking things (I mean, “upgrading” our cluster), let’s understand why we’re doing this. According to the official Cilium documentation, Cilium can fully replace kube-proxy and streamline our networking stack. Why is this cool?
The Two Paths to Kube-Proxy-Free
Path A: Fresh Install
If you’re starting with a new cluster:
# During kubeadm init, use this flag
kubeadm init --skip-phases=addon/kube-proxy
Path B: Existing Cluster (Our Case)
If you have a cluster from Blog 1, we need to:
Remove kube-proxy & Clean up existing rules
# Delete Kube Proxy from kube-system namespace
kubectl -n kube-system delete ds kube-proxy
# Delete the configmap as well to avoid kube-proxy being reinstalled during a Kubeadm upgrade (works only for K8s 1.19 and newer)
kubectl -n kube-system delete cm kube-proxy
# Run on each node with root permissions:
iptables-save | grep -v KUBE | iptables-restore
Upgrade Cilium
Here comes the fun part — configuring Cilium with what feels like every flag known to humanity. With kube-proxy removed, we’ll use the following Helm command to upgrade Cilium. This configuration is created for a single-node home lab Kubernetes cluster (that’s why you’ll notice just one replica for the operator). It leverages Cilium’s eBPF-based data path to replace kube-proxy, enables Ingress with dedicated load balancing, provides IP address management, and activates Hubble for network observability. Cilium literally does it all (this is why I have this blog)!
# Replace “upgrade” with “install” if Cilium is not installed yet
helm upgrade cilium cilium/cilium \
--namespace kube-system \
--version 1.16.3 \
# Reuse existing values from the previous installation. Remove this if you clean install Cilium
--reuse-values \
# Set the number of Cilium operator replicas to 1. In a single-node cluster,
# multiple replicas cause conflicts.
--set operator.replicas=1 \
# Enable Cilium to replace kube-proxy for service handling.
--set kubeProxyReplacement=true \
# Enable the Cilium Ingress controller to handle incoming HTTP/HTTPS traffic.
--set ingressController.enabled=true \
# Configure the Ingress controller to use dedicated load balancers for each service.
--set ingressController.loadbalancerMode=dedicated \
# Specify the Kubernetes API server's hostname. Adjust this to match your
# cluster's API server address.
--set k8sServiceHost="192.168.100.100" \
# Specify the Kubernetes API server's port.
--set k8sServicePort=6443 \
# Enable L2 Announcements to help devices on your local network find your services
# (since we're not using BGP).
--set l2announcements.enabled=true \
# Configure Cilium to use the 'eth+' device for networking. This assumes that
# your network interface(s) match this pattern. Adjust if necessary.
--set devices=eth+ \
# Enable support for external IPs for services. This allows accessing services
# from outside the cluster.
--set externalIPs.enabled=true \
# Set both externalTrafficPolicy and internalTrafficPolicy to Cluster. This
# distributes traffic to all endpoints of a service, regardless of their location.
--set externalTrafficPolicy=Cluster \
--set internalTrafficPolicy=Cluster \
# Enable Hubble Relay and the Hubble UI. Hubble provides observability into
# network flows and metrics.
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
# Disable TLS for Hubble. While not recommended for production, this might be
# acceptable in our home lab setting for simplicity.
--set hubble.tls.enabled=false
(Let’s assume everything above goes well!)
Important Notes
Task 2: (Optional) Refresh IP table NAT rules
Or: “Why Can’t I Access My Services: The Mystery of the Lingering IPTables Rules”
The Problem
So, you’ve completed Task 1, and everything looks perfect:
If you’re like me, you might be scratching your head wondering why external access isn’t working despite everything looking fine. Plot twist: there are some sneaky old Cilium rules hiding in your iptables, living their best life and blocking your traffic!
The Investigation
First, let’s check what Cilium thinks about this situation:
# Check Cilium logs
kubectl logs -n kube-system -l k8s-app=cilium
In my case, I found this lovely message (repeated every 10 seconds, just to make sure I didn’t miss it):
......
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
time="2024-11-27T07:55:39Z" level=error msg="iptables rules full reconciliation failed, will retry another one later" error="failed to remove old backup rules: unable to run 'iptables -t nat -D OLD_CILIUM_POST_nat -s 10.0.0.0/24 ! -d 99.105.108.105/24 ! -o cilium_+ -m comment --comment cilium masquerade non-cluster -j MASQUERADE' iptables command: exit status 1 stderr=\"iptables: Bad rule (does a matching rule exist in that chain?).\\n\"" subsys=iptables
time="2024-11-27T07:55:49Z" level=error msg="iptables rules full reconciliation failed, will retry another one later" error="failed to remove old backup rules: unable to run 'iptables -t nat -D OLD_CILIUM_POST_nat -s 10.0.0.0/24 ! -d 99.105.108.105/24 ! -o cilium_+ -m comment --comment cilium masquerade non-cluster -j MASQUERADE' iptables command: exit status 1 stderr=\"iptables: Bad rule (does a matching rule exist in that chain?).\\n\"" subsys=iptables
......
Understanding the Error
Let’s break down what’s happening here:
The Solution: The Nuclear Option (Kind of)
Since we’re sure we don’t need these old rules (you are sure, right? RIGHT?), we can take matters into our own hands:
# Flush it! (The "I know what I'm doing" approach)
sudo iptables -t nat -F OLD_CILIUM_POST_nat
# Check! The "trust but verify" approach
sudo iptables -t nat -L -n -v
# Pro Tip: If you want to be extra careful,
# you can save your current iptables rules first:
# Backup your current rules (just in case)
sudo iptables-save > iptables-backup-$(date +%F)
The Lazy Solution
(The “reboot fixes everything” approach) You could also just… reboot the cluster.
Verification
After either solution, verify that your external access is working:
# Check if Cilium error logs have stopped
kubectl logs -n kube-system -l k8s-app=cilium | grep -i error
# Try accessing your service (adjust IP/port as needed)
curl https://<your-service-ip>:<port>
Lessons Learned
Ready to move on to Task 3? At least now we know our traffic won’t get lost in the OLD_CILIUM chains.
Note: If you’re wondering why this happens, it’s because during upgrades, Cilium tries to be cautious by backing up old rules before applying new ones. Sometimes, this process doesn’t clean up perfectly, leaving us to play digital janitor.
Task 3: Initiate LoadBalancer IP Pool Management and L2 Announcements
As we discussed earlier, we want to move beyond basic port-forwarding and NodePort to expose our services externally. Ingress is the way to go (we’ll save the Gateway API for another adventure!). Cilium offers a ton of flexibility here, but for our homelab setup, we’ll focus on using it for Ingress, Load Balancing, and IP address management. Alright, let’s understand what we’re getting ourselves into.
The Theory (Don’t Skip This Part)
Here’s the mechanism behind how Cilium assigns IP addresses to Ingress services:
1. Ingress Controller as a LoadBalancer: The Cilium Ingress controller itself acts as a LoadBalancer service. This means it needs an external IP to be reachable from the outside world.
2. LB IPAM Steps In: Cilium’s built-in IP address manager (LB IPAM) recognizes the Ingress controller’s need and assigns it an IP address from one of our pre-configured pools.
3. Traffic Routing: When external traffic comes knocking at the Ingress controller’s new IP address, Cilium’s eBPF-powered data path takes charge.
4. IP Addresses for All: Not only does Cilium handle IP assignment for the Ingress controller, but it can also manage IP addresses for the services accessed through the Ingress.
The Practice (Where Theory Meets Reality)
The Helm commands in Task 1 have already enabled all the features we need for Cilium. However, we don’t have any IP pools configured for Cilium to assign addresses from. Additionally, for L2 announcements to work correctly, we need to set up at least one network policy (like you know how to answer door but you don’t know your street number yet). First, let’s set up our IP pool. We need to tell Cilium which IPs it can hand out:
# Define an IP pool named "general-pool" for Cilium to use for LoadBalancer IPs.
# This pool includes the IP range from 192.168.100.120 to 192.168.100.240.
# Choose your IP range wisely.
# Create “ippool.yaml”:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "general-pool"
spec:
blocks:
- start: "192.168.100.120"
stop: "192.168.100.240"
# Apply the IP pool configuration.
kubectl apply -f ippool.yaml
# Verify that the IP pool has been created successfully.
kubectl get ippools
# ------------------------------------------------------------------------
# Define an L2 Announcement Policy named "policy1".
# This policy enables L2 announcements for all interfaces matching the pattern 'eth[0-9]+'.
# It includes both externalIPs and loadBalancerIPs in the announcements
# (important for discoverability in our home lab).
# Create “l2policy.yaml”:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: policy1
spec:
interfaces:
- ^eth[0-9]+
externalIPs: true
loadBalancerIPs: true
# Apply the L2 Announcement Policy.
kubectl apply -f l2policy.yaml
# Verify that the L2 Announcement Policy has been created.
kubectl get CiliumL2AnnouncementPolicy
# Check if Cilium recognizes the pool
kubectl get ippools general-pool -o yaml
# Watch for L2 announcement events
kubectl logs -n kube-system -l k8s-app=cilium | grep -i "l2"
# Test with a simple LoadBalancer service
kubectl create deployment test-nginx --image=nginx
kubectl expose deployment test-nginx --type=LoadBalancer --port=80
Now that we have our networking foundation solid, we could test it with the classic Bookinfo application from Cilium’s docs [https://docs.cilium.io/en/v1.16/network/servicemesh/http/]. But let’s be honest — we’re going to do something more interesting in the next tasks.
Want to try it anyway? The Bookinfo example will teach you:
But if you’re like me and ready for something more exciting, let’s move on to Task 4 where we’ll bring in Argo CD and start building something real.
Task 4: Deploy Argo CD (We haven’t forgotten you, Argonaut!)
The Background Debrief You Need
Before we jump into the deployment process (the fun part), we need to understand why Cilium + Argo CD is a powerful pair for Kubernetes networking. This combination simplifies managing and securing multiple Kubernetes clusters. Argo CD automates the deployment of Cilium (not on this cluster but for future clusters) and other essential components across your clusters, ensuring consistency and providing GitOps workflows for easy rollbacks and auditability. Cilium then provides unified networking, enhanced security with consistent policies, and deep observability with Hubble across your entire multi-cluster environment.
In short, this allows you to (Seriousness starts here):
· Automated Multi-Cluster Provisioning: Argo CD excels at automating deployments. You can use it to install Cilium as your CNI, along with any necessary CSI plugins and other software, across all your clusters from a central location. This ensures consistency and reduces manual effort. (Almost 2025 now. 2023 is old.)
· GitOps for Networking: By managing your Cilium configuration in Git, you gain all the benefits of GitOps: version control, auditability, and easy rollbacks. This is especially valuable when fine-tuning Cilium’s networking policies across your clusters. (Or, now, you can version control your network mistakes!)
· Observability Across Clusters: Cilium’s Hubble provides deep visibility into network traffic within each cluster. When combined with Argo CD’s deployment tracking, you gain a comprehensive view of your entire multi-cluster environment, making troubleshooting and monitoring a breeze.
领英推荐
· Simplified Multi-Cluster Networking: Cilium can help establish connectivity and manage network policies between your clusters. This is essential for building applications and services that span multiple clusters.
· Enhanced Security: Cilium’s security policies can be applied consistently across all clusters through Argo CD. This ensures a uniform security posture and simplifies security management.
If this homelab project expands into the realm of multi-cluster, this dynamic duo can really streamline many aspects of operations and provide valuable insights. (Seriousness ends here)
The “Fun” Part: Deployment
Time to bring in Argo CD, our GitOps hero! We’ll use the community maintained Helm chart (https://github.com/argoproj/argo-helm/tree/main/charts/argo-cd) to deploy it with its default settings, and we’ll expose the Argo CD UI via a LoadBalancer for easy access.
# Add the Argo CD Helm repository.
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update
# Create a namespace for Argo CD.
kubectl create namespace argocd
# Install Argo CD with its default configuration and expose the UI as a LoadBalancer.
helm install argocd argo/argo-cd \
--set server.service.type=LoadBalancer \
--namespace argocd
# Verify the installation by checking the deployed resources and pod status.
kubectl get all -n argocd
kubectl get pods -n argocd
# Verify Argo CD Server status and find the VIP assigned.
kubectl describe svc argocd-server -n argocd
# Before log in the web UI, fetch the initial password for the admin user
kubectl get secret argocd-initial-admin-secret -n argocd -o jsonpath="{.data.password}" | base64 -d
Verify Everything’s Actually Working (Visually)
You can now use the admin credential to login in Argo CD server:
You may choose to change password if you like:
And since we have Cilium, Hubble has already been observing the traffic in Argocd namespace. We will talk more about Hubble in Task 6:
Once you’ve got Argo CD running, we’ll move on to Task 5 — deploying the OpenTelemetry Astronomy Shop.
Pro Tip: Take a screenshot of your working Argo CD login screen. It might be the last time you see everything working perfectly.
Task 5: Deploy OpenTelemetry Astronomy Shop
Why This Demo?
Probably it is one of the best demo apps of micro service systems you can find on Github. The OpenTelemetry Astronomy Shop Demo is a comprehensive microservices-based application designed to showcase the capabilities of OpenTelemetry in a realistic, distributed system environment. Visit the official site for more information [https://opentelemetry.io/docs/demo/kubernetes-deployment/]. While it is designed for OpenTelemetry, the diverse and rich interactions this demo brings in allow us to observe and learn Kubernetes, Cilium, and Argo CD in a near real world environment. In short, here’s why:
· Real-world microservices architecture (not just three services talking to each other)
· Actual business logic (instead of just “Hello World” with extra steps)
· Rich service interactions (because life isn’t just HTTP GET requests)
· Built-in observability (so we can watch everything break in real-time)
Warning: This app needs 6GB RAM!
Deployment via Argo CD UI (Because CLIs are So 2023?)
Let’s install the demo app with Argo CD UI and Helm chart by following these steps:
1. Log in to Argo CD: Access the Argo CD UI and log in with your credentials.
2. Navigate to Repository Settings:
3. Add a New Repository:
4. Fill in the Helm Repository Details:
5. Save:
To create the demo app, click Applications on the left panel and then:
1. Enter General Settings:
2. Enter Source and Destination Fields:
3. Select the Value File: Select values.yaml from dropdown and leave other HELM values unchanged
4. Create Application:
After the demo app starts running (if your single node survived) , you can follow the official guide [https://opentelemetry.io/docs/demo/kubernetes-deployment/#expose-services-using-kubectl-port-forward] to setup port-forward to access the app. Alternatively, use LoadBalancer to assign an external IP for easier access. First, change the SYNC POLICY from Automatic to Manual. This prevents Argo CD from automatically reverting your app to previous settings.
1. Navigate to Application Setting:
2. Disable AUTO-SYNC:
3. Assign External IP:
Now, you can access the demo apps (assuming everything worked) via:
The demo app is now up and running!
Once everything’s running (or you’ve convinced yourself this is as good as it gets), we’ll move to Task 6 — where we’ll use Hubble to observe all these services talking to each other. Because the only thing better than having system problems is being able to visualize them in a pretty graph!
Task 6: Observe the Demo App with Hubble
What’s Hubble Again?
Remember when we enabled Hubble back in Task 1? That wasn’t just for fun (okay, maybe a little). Built on Cilium and eBPF, Hubble is like CCTV for your cluster — but instead of catching shoplifters, it catches network problems. (Learn more about Hubble on the official site: [https://docs.cilium.io/en/stable/overview/intro/#what-is-hubble].)
Access Hubble UI
Ensure Hubble UI is configured for external access via port-forward or LoadBalancer. Once the webpage opens, select the desired namespace. Hubble displays collected L3/L4 data. If no data appears, it means no messages have been sent among the pods in that namespace (see? The app is already broken!). The graph dynamically builds based on observed networking patterns. I deployed the demo app in the demo namespace. Here’s how it looks now in Hubble:
As we can see, this OpenTelemetry demo looks like a plate of spaghetti in Hubble! This is normal since it simulates a realistic full stack app. To focus on a specific pod and the networking around it, simply zoom in and double click on the pod.
The load generator sends payloads to the frontend proxy to simulate visits:
Prometheus primarily pulls data from multiple OpenTelemetry ports, while OpenTelemetry occasionally needs to access Prometheus’s API (port 9090) for specific operations:
Services (fraud/accounting/checkout) — -> Kafka (9092) — -> OpenTelemetry (4318), for possible reasons like Kafka acts as a buffer between services and OpenTelemetry or if OpenTelemetry is down or slow, data isn’t lost:
What We Learned
Conclusion
That’s it! You’ve made it through all six tasks. Your single node is probably begging for mercy, but hey — it’s handled everything we’ve thrown at it.
What Have We Actually Accomplished?
After this journey through eBPF, GitOps, probably too many YAML files, and unnecessary humor, we’ve built something that’s either impressive or concerning (depending on who you ask):
2. Automated Deployments:
3. Gained Deep Visibility:
4. Built a Lab Environment:
Yes, now, we are most at the end of Blog 2. But, hey, wait! Don’t clean you demo app yet! There’s more:
The Accidental Bonus: Jaeger Tracing
Due to the unintentional installation of Jaeger (hey, it came as a free surprise with the demo application’s Helm), we should totally use it to our advantage. Given the length of this blog, we’re not going to bore you with endless details here. To keep things snappy and fun, we’ll throw in some extra screenshots instead of droning on with lengthy explanations. Enjoy the visuals!
Access the UI
Open https:// <VIP>:8080/jaeger/ui/
Select a service, I recommend checkoutservice (because why not?) and click Find Traces
Find Some Traces
Yap! There are two (on my cluster, not yours). Let’s click on the second one:
Analyze the Results
Then you can see frontend and checkoutservice took a lot of time to respond. With no obvious reasons, let’s conclude it’s my laptop is too old (seriously!) and too many other programs are running and siding the VMs. I am appreciate my laptop isn’t broken yet though.
Final Thoughts
This setup is probably overkill for a single node, but that’s what makes it fun. We’ve created a lab environment that’s:
Stay tuned, there are more to come!