Cilium & Argo CD on a Single-Node Kubernetes Cluster on Your Laptop - A Love Story of eBPF and GitOps

Cilium & Argo CD on a Single-Node Kubernetes Cluster on Your Laptop - A Love Story of eBPF and GitOps

After setting up a functional Kubernetes cluster in Blog 1, the cluster is just at the starting point. While it could handle basic tasks like port-forwarding and NodePort for quick experiments (the "Hello World" phase of Kubernetes), it wasn’t ready for the real-world challenges of external access, scalability, and security. The absence of critical features like Ingress, Load Balancing, and IP Address Management meant it was time to level up. This means the cluster is still too amateur-ish!

?

I am very used to MetalLB and Traefik for networking features. However, my blog journey needed something fresh, and that's when I discovered Cilium could cover most of my needs. It's like finding out your Swiss Army knife can also make espresso - suddenly, everything else seems redundant!

?

So, to make the cluster truly production-ready (no, lab-ready I meant - let's not get too ambitious here), I'm equipping it with Cilium's advanced networking capabilities and Argo CD for seamless GitOps-style application deployment. Think of it as giving your Kubernetes cluster both a brain upgrade and a fancy new suit.


Why Cilium? Because Multiple Tools Are So 2023

Cilium has evolved from "just another CNI" to "the CNI that ate all other CNIs" (in a good way). It's setting its sights on being "the only stack" for Kubernetes networking, which is either brilliantly ambitious or slightly megalomaniacal - you decide. Over the years, it has evolved to address key areas that would typically require multiple tools:

  • Networking: Efficient, low-latency pod-to-pod communication powered by eBPF. Think of it as giving your pods a direct line instead of playing telephone through multiple networking layers.
  • Security: Policy-driven access controls that actually understand what your apps are doing (creepy, but useful).
  • Observability: Granular traffic insights with tools like Hubble to monitor flows in real time. Because what's the point of having network issues if you can't see pretty graphs of them?
  • Service Mesh Compatibility: Smooth integration with service mesh solutions like Istio or Envoy. Because we all need more YAML in our lives, right?

?

In this blog, I'll explore how Cilium can complete our cluster by enabling:

  1. Ingress: Because I'm not quite ready for the Gateway API adventure yet (one step at a time!)
  2. Layer 2 Load Balancing: Since we're not using BGP here (keeping it simple for our homelab)
  3. Observability using Hubble: Because troubleshooting without proper visibility is like debugging with a blindfold


We'll configure Cilium to activate these features, verify its functionality, deploy Argo CD for automated application management, and conclude by deploying a demo app that's actually interesting (sorry, Bookinfo demo, but it's time to move on).

?


?

Prerequisites

Before you dive in, make sure you have a similar environment. If you followed Blog 1 (which isn't mandatory, but hey, it helps), your environment would look like this:

Kubernetes Cluster Setup:

  • A single-node Kubernetes cluster running vanilla Kubernetes v1.31.2
  • Cluster hosted on a Hyper-V virtual machine with Ubuntu 24.04 LTS as the operating system.
  • Pro Tip: Make sure your host machine has enough resources. I recommend at least 8GB RAM and 4 CPU cores for the VM. 16GB RAM would be even better!

Container Runtime:

  • Containerd configured as the container runtime with SystemdCgroup enabled for Kubernetes compatibility.

Networking:

  • Network managed through a custom InternalNATSwitch with static IP addressing and no DHCP for stability.
  • Basic pod-to-pod communication enabled but lacks advanced networking like Ingress or Load Balancing.
  • Cilium 1.16.3 installed via Helm. If you have a different version, that’s fine, but you might need to adjust some commands

Cluster Configuration:

  • Core Kubernetes components (kubelet, kubeadm, kubectl) installed and initialized using kubeadm init.
  • Pods can run successfully, but the control plane is not optimized for external traffic.

System Preparations:

  • SWAP is disabled for better performance and Kubernetes resource management.
  • IP Forwarding is enabled for pod networking.
  • Optional kernel optimizations included (e.g., linux-azure for Hyper-V).

Cluster Limitations:

  • No Ingress or Load Balancer (we’ll fix that!)
  • No advanced networking capabilities yet
  • Port-forwarding and NodePort are your only friends for now
  • Warning: If you’re trying to access services, prepare for some manual port management

Additional Tools:

  • (Must-Have ???) Helm CLI installed
  • (Nice-to-Have ??) k9s may be set up for efficient Kubernetes monitoring and debugging, seriously consider installing it

Remember: A working basic cluster is crucial before adding more complexity. If something’s not right at this stage, it’ll only get more interesting (read: problematic) later!

Overview of Tasks

The setup procedure is still straightforward like in Blog 1(if you believe that…) and involves the following tasks:

  1. Delete Kube Proxy and Upgrade Cilium with Additional Parameters: Because who needs two networking stacks when one can cause enough problems?
  2. (Optional) Refresh IP table NAT rules: Clear the old Cilium NAT rules blocking external IPs from reaching Ingress services.
  3. Initiate LoadBalancer IP Pool Management and L2 Announcements: Make your services actually reachable.
  4. Deploy Argo CD: Set up GitOps-style deployment tool for this and future projects because manually applying YAML files is so last decade.
  5. Deploy OpenTelemetry Astronomy Shop with Argo CD: Deploy a more sophisticated micro-service app to test our setup. No more Bookinfo demo…
  6. Observe the Demo App with Hubble: Use Hubble, Cilium’s observability tool, to monitor traffic flows.. because what’s the point of having problems if you can’t see them in pretty graphs?


Task 1: Delete Kube Proxy and Upgrade/Install Cilium with Additional Parameters

Understanding the Why

Before we start breaking things (I mean, “upgrading” our cluster), let’s understand why we’re doing this. According to the official Cilium documentation, Cilium can fully replace kube-proxy and streamline our networking stack. Why is this cool?

  1. Reduced Complexity: One less component to manage
  2. Better Performance: Direct eBPF processing instead of iptables rules
  3. Enhanced Features: Native support for things like XDP load balancing

The Two Paths to Kube-Proxy-Free

Path A: Fresh Install

If you’re starting with a new cluster:

# During kubeadm init, use this flag
kubeadm init --skip-phases=addon/kube-proxy        

Path B: Existing Cluster (Our Case)

If you have a cluster from Blog 1, we need to:

  1. Remove kube-proxy & Clean up existing rules
  2. Upgrade Cilium

Remove kube-proxy & Clean up existing rules

# Delete Kube Proxy from kube-system namespace
kubectl -n kube-system delete ds kube-proxy

# Delete the configmap as well to avoid kube-proxy being reinstalled during a Kubeadm upgrade (works only for K8s 1.19 and newer)
kubectl -n kube-system delete cm kube-proxy

# Run on each node with root permissions:
iptables-save | grep -v KUBE | iptables-restore        

Upgrade Cilium

Here comes the fun part — configuring Cilium with what feels like every flag known to humanity. With kube-proxy removed, we’ll use the following Helm command to upgrade Cilium. This configuration is created for a single-node home lab Kubernetes cluster (that’s why you’ll notice just one replica for the operator). It leverages Cilium’s eBPF-based data path to replace kube-proxy, enables Ingress with dedicated load balancing, provides IP address management, and activates Hubble for network observability. Cilium literally does it all (this is why I have this blog)!

# Replace “upgrade” with “install” if Cilium is not installed yet
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --version 1.16.3 \

  # Reuse existing values from the previous installation. Remove this if you clean install Cilium 
  --reuse-values \

  # Set the number of Cilium operator replicas to 1. In a single-node cluster, 
  # multiple replicas cause conflicts.
  --set operator.replicas=1 \

  # Enable Cilium to replace kube-proxy for service handling.
  --set kubeProxyReplacement=true \

  # Enable the Cilium Ingress controller to handle incoming HTTP/HTTPS traffic.
  --set ingressController.enabled=true \

  # Configure the Ingress controller to use dedicated load balancers for each service.
  --set ingressController.loadbalancerMode=dedicated \

  # Specify the Kubernetes API server's hostname. Adjust this to match your 
  # cluster's API server address.
  --set k8sServiceHost="192.168.100.100" \

  # Specify the Kubernetes API server's port.
  --set k8sServicePort=6443 \

  # Enable L2 Announcements to help devices on your local network find your services
  # (since we're not using BGP).
  --set l2announcements.enabled=true \

  # Configure Cilium to use the 'eth+' device for networking. This assumes that 
  # your network interface(s) match this pattern. Adjust if necessary.
  --set devices=eth+ \

  # Enable support for external IPs for services. This allows accessing services 
  # from outside the cluster.
  --set externalIPs.enabled=true \

  # Set both externalTrafficPolicy and internalTrafficPolicy to Cluster. This 
  # distributes traffic to all endpoints of a service, regardless of their location.
  --set externalTrafficPolicy=Cluster \
  --set internalTrafficPolicy=Cluster \

  # Enable Hubble Relay and the Hubble UI. Hubble provides observability into 
  # network flows and metrics.
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \

  # Disable TLS for Hubble. While not recommended for production, this might be 
  # acceptable in our home lab setting for simplicity.
  --set hubble.tls.enabled=false        

(Let’s assume everything above goes well!)

Important Notes

  • If this is a fresh Cilium installation, you’re done! Pat yourself on the back
  • If you’re upgrading like us, we need to do some cleanup (that’s coming in Task 2)
  • Don’t panic if not everything works perfectly yet — we still have more configuration to do



Task 2: (Optional) Refresh IP table NAT rules

Or: “Why Can’t I Access My Services: The Mystery of the Lingering IPTables Rules”

The Problem

So, you’ve completed Task 1, and everything looks perfect:

  • Cilium pods are running ?
  • DaemonSet is happy ?
  • Services are created and External IPs are assigned ?
  • External access is… wait, what? ??

If you’re like me, you might be scratching your head wondering why external access isn’t working despite everything looking fine. Plot twist: there are some sneaky old Cilium rules hiding in your iptables, living their best life and blocking your traffic!

The Investigation

First, let’s check what Cilium thinks about this situation:

  # Check Cilium logs
  kubectl logs -n kube-system -l k8s-app=cilium        

In my case, I found this lovely message (repeated every 10 seconds, just to make sure I didn’t miss it):

......
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
time="2024-11-27T07:55:39Z" level=error msg="iptables rules full reconciliation failed, will retry another one later" error="failed to remove old backup rules: unable to run 'iptables -t nat -D OLD_CILIUM_POST_nat -s 10.0.0.0/24 ! -d 99.105.108.105/24 ! -o cilium_+ -m comment --comment cilium masquerade non-cluster -j MASQUERADE' iptables command: exit status 1 stderr=\"iptables: Bad rule (does a matching rule exist in that chain?).\\n\"" subsys=iptables
time="2024-11-27T07:55:49Z" level=error msg="iptables rules full reconciliation failed, will retry another one later" error="failed to remove old backup rules: unable to run 'iptables -t nat -D OLD_CILIUM_POST_nat -s 10.0.0.0/24 ! -d 99.105.108.105/24 ! -o cilium_+ -m comment --comment cilium masquerade non-cluster -j MASQUERADE' iptables command: exit status 1 stderr=\"iptables: Bad rule (does a matching rule exist in that chain?).\\n\"" subsys=iptables
......        


Understanding the Error

Let’s break down what’s happening here:

  1. Cilium is trying to be a good citizen and clean up old rules
  2. It’s specifically targeting rules in the OLD_CILIUM_POST_nat chain
  3. The cleanup is failing because… well, iptables is being iptables


The Solution: The Nuclear Option (Kind of)

Since we’re sure we don’t need these old rules (you are sure, right? RIGHT?), we can take matters into our own hands:

# Flush it! (The "I know what I'm doing" approach)
sudo iptables -t nat -F OLD_CILIUM_POST_nat

# Check! The "trust but verify" approach
sudo iptables -t nat -L -n -v

# Pro Tip: If you want to be extra careful,
# you can save your current iptables rules first:
# Backup your current rules (just in case)
sudo iptables-save > iptables-backup-$(date +%F)        

The Lazy Solution

(The “reboot fixes everything” approach) You could also just… reboot the cluster.

Verification

After either solution, verify that your external access is working:

# Check if Cilium error logs have stopped
kubectl logs -n kube-system -l k8s-app=cilium | grep -i error

# Try accessing your service (adjust IP/port as needed)
curl https://<your-service-ip>:<port>        

Lessons Learned

  1. Sometimes “optional” steps aren’t really optional
  2. IPTables rules are like houseguests — they sometimes overstay their welcome
  3. Always check iptables when network things mysteriously don’t work
  4. Rebooting works more often than we’d like to admit

Ready to move on to Task 3? At least now we know our traffic won’t get lost in the OLD_CILIUM chains.

Note: If you’re wondering why this happens, it’s because during upgrades, Cilium tries to be cautious by backing up old rules before applying new ones. Sometimes, this process doesn’t clean up perfectly, leaving us to play digital janitor.


Task 3: Initiate LoadBalancer IP Pool Management and L2 Announcements

As we discussed earlier, we want to move beyond basic port-forwarding and NodePort to expose our services externally. Ingress is the way to go (we’ll save the Gateway API for another adventure!). Cilium offers a ton of flexibility here, but for our homelab setup, we’ll focus on using it for Ingress, Load Balancing, and IP address management. Alright, let’s understand what we’re getting ourselves into.

The Theory (Don’t Skip This Part)

Here’s the mechanism behind how Cilium assigns IP addresses to Ingress services:

1. Ingress Controller as a LoadBalancer: The Cilium Ingress controller itself acts as a LoadBalancer service. This means it needs an external IP to be reachable from the outside world.

2. LB IPAM Steps In: Cilium’s built-in IP address manager (LB IPAM) recognizes the Ingress controller’s need and assigns it an IP address from one of our pre-configured pools.

3. Traffic Routing: When external traffic comes knocking at the Ingress controller’s new IP address, Cilium’s eBPF-powered data path takes charge.

4. IP Addresses for All: Not only does Cilium handle IP assignment for the Ingress controller, but it can also manage IP addresses for the services accessed through the Ingress.

The Practice (Where Theory Meets Reality)

The Helm commands in Task 1 have already enabled all the features we need for Cilium. However, we don’t have any IP pools configured for Cilium to assign addresses from. Additionally, for L2 announcements to work correctly, we need to set up at least one network policy (like you know how to answer door but you don’t know your street number yet). First, let’s set up our IP pool. We need to tell Cilium which IPs it can hand out:

# Define an IP pool named "general-pool" for Cilium to use for LoadBalancer IPs.
# This pool includes the IP range from 192.168.100.120 to 192.168.100.240.
# Choose your IP range wisely.
# Create “ippool.yaml”:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: "general-pool"
spec:
  blocks:
  - start: "192.168.100.120"
    stop: "192.168.100.240"


# Apply the IP pool configuration.
kubectl apply -f ippool.yaml 

# Verify that the IP pool has been created successfully.
kubectl get ippools

# ------------------------------------------------------------------------

# Define an L2 Announcement Policy named "policy1".
# This policy enables L2 announcements for all interfaces matching the pattern 'eth[0-9]+'.
# It includes both externalIPs and loadBalancerIPs in the announcements
# (important for discoverability in our home lab).
# Create “l2policy.yaml”:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
  name: policy1
spec:
  interfaces:
  - ^eth[0-9]+
  externalIPs: true
  loadBalancerIPs: true

# Apply the L2 Announcement Policy.
kubectl apply -f l2policy.yaml

# Verify that the L2 Announcement Policy has been created.
kubectl get CiliumL2AnnouncementPolicy

# Check if Cilium recognizes the pool
kubectl get ippools general-pool -o yaml

# Watch for L2 announcement events
kubectl logs -n kube-system -l k8s-app=cilium | grep -i "l2"

# Test with a simple LoadBalancer service
kubectl create deployment test-nginx --image=nginx
kubectl expose deployment test-nginx --type=LoadBalancer --port=80        

Now that we have our networking foundation solid, we could test it with the classic Bookinfo application from Cilium’s docs [https://docs.cilium.io/en/v1.16/network/servicemesh/http/]. But let’s be honest — we’re going to do something more interesting in the next tasks.

Want to try it anyway? The Bookinfo example will teach you:

  • How to deploy a simple microservices app
  • Configure path-based Ingress routing
  • Watch Cilium assign IPs automagically
  • Test service access

But if you’re like me and ready for something more exciting, let’s move on to Task 4 where we’ll bring in Argo CD and start building something real.



Task 4: Deploy Argo CD (We haven’t forgotten you, Argonaut!)

The Background Debrief You Need

Before we jump into the deployment process (the fun part), we need to understand why Cilium + Argo CD is a powerful pair for Kubernetes networking. This combination simplifies managing and securing multiple Kubernetes clusters. Argo CD automates the deployment of Cilium (not on this cluster but for future clusters) and other essential components across your clusters, ensuring consistency and providing GitOps workflows for easy rollbacks and auditability. Cilium then provides unified networking, enhanced security with consistent policies, and deep observability with Hubble across your entire multi-cluster environment.

In short, this allows you to (Seriousness starts here):

· Automated Multi-Cluster Provisioning: Argo CD excels at automating deployments. You can use it to install Cilium as your CNI, along with any necessary CSI plugins and other software, across all your clusters from a central location. This ensures consistency and reduces manual effort. (Almost 2025 now. 2023 is old.)

· GitOps for Networking: By managing your Cilium configuration in Git, you gain all the benefits of GitOps: version control, auditability, and easy rollbacks. This is especially valuable when fine-tuning Cilium’s networking policies across your clusters. (Or, now, you can version control your network mistakes!)

· Observability Across Clusters: Cilium’s Hubble provides deep visibility into network traffic within each cluster. When combined with Argo CD’s deployment tracking, you gain a comprehensive view of your entire multi-cluster environment, making troubleshooting and monitoring a breeze.

· Simplified Multi-Cluster Networking: Cilium can help establish connectivity and manage network policies between your clusters. This is essential for building applications and services that span multiple clusters.

· Enhanced Security: Cilium’s security policies can be applied consistently across all clusters through Argo CD. This ensures a uniform security posture and simplifies security management.

If this homelab project expands into the realm of multi-cluster, this dynamic duo can really streamline many aspects of operations and provide valuable insights. (Seriousness ends here)

The “Fun” Part: Deployment

Time to bring in Argo CD, our GitOps hero! We’ll use the community maintained Helm chart (https://github.com/argoproj/argo-helm/tree/main/charts/argo-cd) to deploy it with its default settings, and we’ll expose the Argo CD UI via a LoadBalancer for easy access.

# Add the Argo CD Helm repository.
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update

# Create a namespace for Argo CD.
kubectl create namespace argocd

# Install Argo CD with its default configuration and expose the UI as a LoadBalancer.
helm install argocd argo/argo-cd \
    --set server.service.type=LoadBalancer \
    --namespace argocd

# Verify the installation by checking the deployed resources and pod status.
kubectl get all -n argocd
kubectl get pods -n argocd

# Verify Argo CD Server status and find the VIP assigned.
kubectl describe svc argocd-server -n argocd

# Before log in the web UI, fetch the initial password for the admin user
kubectl get secret argocd-initial-admin-secret -n argocd -o jsonpath="{.data.password}" | base64 -d        

Verify Everything’s Actually Working (Visually)

You can now use the admin credential to login in Argo CD server:

You may choose to change password if you like:

And since we have Cilium, Hubble has already been observing the traffic in Argocd namespace. We will talk more about Hubble in Task 6:

Once you’ve got Argo CD running, we’ll move on to Task 5 — deploying the OpenTelemetry Astronomy Shop.

Pro Tip: Take a screenshot of your working Argo CD login screen. It might be the last time you see everything working perfectly.


Task 5: Deploy OpenTelemetry Astronomy Shop

Why This Demo?

Probably it is one of the best demo apps of micro service systems you can find on Github. The OpenTelemetry Astronomy Shop Demo is a comprehensive microservices-based application designed to showcase the capabilities of OpenTelemetry in a realistic, distributed system environment. Visit the official site for more information [https://opentelemetry.io/docs/demo/kubernetes-deployment/]. While it is designed for OpenTelemetry, the diverse and rich interactions this demo brings in allow us to observe and learn Kubernetes, Cilium, and Argo CD in a near real world environment. In short, here’s why:

· Real-world microservices architecture (not just three services talking to each other)

· Actual business logic (instead of just “Hello World” with extra steps)

· Rich service interactions (because life isn’t just HTTP GET requests)

· Built-in observability (so we can watch everything break in real-time)

Warning: This app needs 6GB RAM!


Deployment via Argo CD UI (Because CLIs are So 2023?)

Let’s install the demo app with Argo CD UI and Helm chart by following these steps:

1. Log in to Argo CD: Access the Argo CD UI and log in with your credentials.

2. Navigate to Repository Settings:

  • Go to Settings (gear icon) in the sidebar.
  • Click on Repositories.

3. Add a New Repository:

  • Click on Connect Repo.
  • Choose VIA HTTPS as the connection method.
  • In the repository type, select Helm.

4. Fill in the Helm Repository Details:

5. Save:

  • Click Connect at the top to save the repository.

To create the demo app, click Applications on the left panel and then:

1. Enter General Settings:

  • Enter an Application Name
  • Select default for Project Name
  • Use Automatic for SYNC POLICY
  • Select Auto-Create Namespace and RETRY

2. Enter Source and Destination Fields:

3. Select the Value File: Select values.yaml from dropdown and leave other HELM values unchanged

4. Create Application:

  • Scroll up to the top of the page and click CREATE

After the demo app starts running (if your single node survived) , you can follow the official guide [https://opentelemetry.io/docs/demo/kubernetes-deployment/#expose-services-using-kubectl-port-forward] to setup port-forward to access the app. Alternatively, use LoadBalancer to assign an external IP for easier access. First, change the SYNC POLICY from Automatic to Manual. This prevents Argo CD from automatically reverting your app to previous settings.

1. Navigate to Application Setting:

  • Click on DETAILS.
  • Scroll down to find SYNC POLICY.

2. Disable AUTO-SYNC:

  • Click on DISABLE AUTO-SYNC and confirm.
  • Close the page.

3. Assign External IP:

  • In terminal, run “kubectl edit svc demo-frontendproxy -n demo” change namespace if needed
  • Scroll down and change spec.type from ClusterIP to LoadBalancer
  • Save and leave
  • Run “kubectl describe svc demo-frontendproxy -n demo” to find the IP assigned by Cilium

Now, you can access the demo apps (assuming everything worked) via:

  • Web store: https://<VIP>:8080/
  • Grafana: https:// <VIP>:8080/grafana/
  • Load Generator UI: https:// <VIP>:8080/loadgen/
  • Jaeger UI: https:// <VIP>:8080/jaeger/ui/
  • Flagd configurator UI: https:// <VIP>:8080/feature

The demo app is now up and running!

Once everything’s running (or you’ve convinced yourself this is as good as it gets), we’ll move to Task 6 — where we’ll use Hubble to observe all these services talking to each other. Because the only thing better than having system problems is being able to visualize them in a pretty graph!




Task 6: Observe the Demo App with Hubble

What’s Hubble Again?

Remember when we enabled Hubble back in Task 1? That wasn’t just for fun (okay, maybe a little). Built on Cilium and eBPF, Hubble is like CCTV for your cluster — but instead of catching shoplifters, it catches network problems. (Learn more about Hubble on the official site: [https://docs.cilium.io/en/stable/overview/intro/#what-is-hubble].)

Access Hubble UI

Ensure Hubble UI is configured for external access via port-forward or LoadBalancer. Once the webpage opens, select the desired namespace. Hubble displays collected L3/L4 data. If no data appears, it means no messages have been sent among the pods in that namespace (see? The app is already broken!). The graph dynamically builds based on observed networking patterns. I deployed the demo app in the demo namespace. Here’s how it looks now in Hubble:

As we can see, this OpenTelemetry demo looks like a plate of spaghetti in Hubble! This is normal since it simulates a realistic full stack app. To focus on a specific pod and the networking around it, simply zoom in and double click on the pod.

The load generator sends payloads to the frontend proxy to simulate visits:

Prometheus primarily pulls data from multiple OpenTelemetry ports, while OpenTelemetry occasionally needs to access Prometheus’s API (port 9090) for specific operations:

Services (fraud/accounting/checkout) — -> Kafka (9092) — -> OpenTelemetry (4318), for possible reasons like Kafka acts as a buffer between services and OpenTelemetry or if OpenTelemetry is down or slow, data isn’t lost:

What We Learned

  1. Your microservices talk more than a group chat at 3 AM
  2. Kafka really is at the center of everything
  3. Prometheus will monitor anything that stays still long enough
  4. Network visualization is oddly satisfying



Conclusion

That’s it! You’ve made it through all six tasks. Your single node is probably begging for mercy, but hey — it’s handled everything we’ve thrown at it.

What Have We Actually Accomplished?

After this journey through eBPF, GitOps, probably too many YAML files, and unnecessary humor, we’ve built something that’s either impressive or concerning (depending on who you ask):

  1. Simplified Our Stack:

  • Replaced multiple networking components with Cilium
  • Reduced complexity (while increasing our caffeine dependency)
  • Proved that one complicated tool is better than many simple ones

2. Automated Deployments:

  • Set up Argo CD for GitOps
  • Finally stopped copying YAML from Slack messages
  • Automated our way into both solutions and problems

3. Gained Deep Visibility:

  • Implemented Hubble because ignorance isn’t bliss
  • Created pretty graphs of our traffic flows
  • Finally able to understand why our services aren’t talking to each other

4. Built a Lab Environment:

  • Created a platform ready for more experiments
  • Prepared for the occasional production-like disaster
  • Made our laptop fans work harder than ever

Yes, now, we are most at the end of Blog 2. But, hey, wait! Don’t clean you demo app yet! There’s more:

The Accidental Bonus: Jaeger Tracing

Due to the unintentional installation of Jaeger (hey, it came as a free surprise with the demo application’s Helm), we should totally use it to our advantage. Given the length of this blog, we’re not going to bore you with endless details here. To keep things snappy and fun, we’ll throw in some extra screenshots instead of droning on with lengthy explanations. Enjoy the visuals!

Access the UI

Open https:// <VIP>:8080/jaeger/ui/

Select a service, I recommend checkoutservice (because why not?) and click Find Traces

Find Some Traces

Yap! There are two (on my cluster, not yours). Let’s click on the second one:

Analyze the Results

Then you can see frontend and checkoutservice took a lot of time to respond. With no obvious reasons, let’s conclude it’s my laptop is too old (seriously!) and too many other programs are running and siding the VMs. I am appreciate my laptop isn’t broken yet though.

Final Thoughts

This setup is probably overkill for a single node, but that’s what makes it fun. We’ve created a lab environment that’s:

  • Over-engineered enough to be interesting
  • Complex enough to be educational
  • Resource-hungry enough to justify that RAM upgrade (just kidding!)

Stay tuned, there are more to come!


要查看或添加评论,请登录

Jeff Cheng的更多文章

社区洞察

其他会员也浏览了