AKS terraform controller with Gitops + Flux
Balaganesh MURUGAN
Cloud Architect/Devops/ Security {Azure, AWS, Google Cloud } -> Hands-On Infra, SRE, Devops, Linux, DBA , Monitoring | M-365 (Security) | Azure HCI| Migration expert |ELK| Jenkins| TOGAF
AKS & Flux via Terraform
Introduction
This article will demonstrate how to bootstrap Flux onto an existing Azure AKS cluster using Terraform then delve into some examples of using Flux to deploy your resources. The article is technical in nature and deliberately uses low-level commands as, sometimes, you can't beat 'getting your hands dirty'.
What is Flux?
Flux falls into the Cloud Native Computing Foundation (CNCF) category of Continuous Integration & Delivery?tools. The CNCF assigns maturity levels to cloud native software and Flux is one of only two in this category that are assigned the graduated status, meaning they are stable and used successfully in production environments. The full list of those assigned a status in this category are:
Flux in its own words, straight from the Flux Documentation:
Flux is a tool for keeping Kubernetes clusters in sync with sources of configuration (like Git repositories), and automating updates to configuration when there is new code to deploy.
Flux is built from the ground up to use Kubernetes' API extension system, and to integrate with Prometheus and other core components of the Kubernetes ecosystem. Flux supports multi-tenancy and support for syncing an arbitrary number of Git repositories
Really this is introducing a GitOps model for continuous deployment of applications to Kubernetes. A quick definition of GitOps may be useful:
GitOps is a way of implementing Continuous Deployment for cloud native applications. It focuses on a developer-centric experience when operating infrastructure, by using tools developers are already familiar with, including Git and Continuous Deployment tools.
The core idea of GitOps is having a Git repository that always contains declarative descriptions of the infrastructure currently desired in the production environment and an automated process to make the production environment match the described state in the repository. If you want to deploy a new application or update an existing one, you only need to update the repository - the automated process handles everything else. It’s like having cruise control for managing your applications in production.
Rather than repeat too much of the documentation we will make some notes on why we think this is interesting. So, Flux essentially reconciles desired state in a source Git repository with the current state on a Kubernetes cluster:
How is this different to how this might normally be done? Many teams starting out with DevOps for K8s will use a permissioned build agent to deploy resources to K8s in a push model:
Installing Flux on AKS
1. Before you start
There are a few requirements for our demo code to run, so let’s spend some time checking these before we go any further:
Ready? Let’s move on.
2. Setup Deploy Keys
In this example, we’ll use GitHub as our target Flux repository. GitHub has supported read-only SSH keys for some time and provides a good base for our demo. To keep things simple, we’ll create a fork of the BlakYaks demo repository to use as our Flux target.
Browse to our= https://github.com/balagcpcloud/flux-bootstrap, and click the Fork button to create a copy in your organization or account. You must ensure that “Copy the main branch only” is unticked, as we use one of the branches later in our example:
Once this is done, we will also need to create a read-only SSH (deploy) key that we can use with Flux to check-in with our new forked repository.
Select Settings → Deploy keys → Add deploy key to create a new read-only SSH key.
Have a look here?if you need a guide on creating a new SSH key pair.
3. Creating our Flux bootstrap configuration
Once you’ve created your fork, create a folder to hold the repository code and clone it locally, for example:
The terraform/my.auto.tfvars file needs to be updated to reflect your AKS environment. The latest Flux provider uses the same authentication method as the Hashicorp native Kubernetes provider to push the Flux components into the target cluster. In our demo we will pull our credentials directly from the AKS cluster (though you could also simply pass in a kubeconfig file).
Edit the file in your repo with your details, and save when complete:
Notes on the credentials:
4. Bootstrap the cluster
We will use Terraform to deploy Flux onto an existing AKS cluster. For the sake of brevity, we’ll assume you’ve already built your cluster and have access to it. Our demo codebase also requires you have logged in to your Azure tenant and have selected your target subscription, so let’s do this first, using the Azure CLI:
Bootstrapping the cluster is now simply a task of running a Terraform apply against our code.
If all went to plan, your cluster should now be running the Flux controllers within a new flux-system namespace. Let’s double check:
Finally, let’s perform one last piece of tidying up before we move on:
So, what exactly just happened?
The current Flux provider contains a single resource, flux_bootstrap_git, which is responsible for not only installing Flux onto the target cluster, but also updating the source code repository with the bootstrap files. This essentially provides us with a blank canvas to start adding additional kustomizations, which we’ll do in a short while. The files added to the flux/clusters/development/flux-system folder of your fork (and deployed to your cluster) are:
We will describe these in more detail in the Flux architecture section.
Points worth mentioning about the installation process:
The original secret can be deleted once we have finished our bootstrapping, which is why we ran the kubectl delete secret ... step earlier.
Now, let’s move on; The Terraform code has made changes to your source control branch so you will need to pull the changes down:
git pull
…and then push the changes to your autos file back up:
git add -A && git commit -m "Updated auto vars file" && git push
Now that we’ve installed Flux, let’s explore what Flux looks like inside our cluster….
Flux Architecture
Flux has 4 main controllers:
Definitions from Flux:
The official Flux documentation is good (and pleasingly succinct) here. So we highly recommend taking a few minutes to review:
You should have the controllers running within their deployments:
The controllers are underpinned by a set of CustomResourceDefinition objects:
But most of your work with Flux is via the surfaced API resources:
Official docs for these API resources:
Hopefully the article is self sufficient so you could get away without reading these docs (maybe with the exception of kustomization, worth reading that one) but if you wanted to follow the docs as we go:
When we applied our Terraform to the cluster we created these resources for the Flux system but also configured Flux to monitor on it's own configuration and reconcile when there are changes in the source repository. Let’s go through the initial config files one-by-one:
1 gotk-components.yaml - Essentially installs Flux itself. Namespace, CRDs, service accounts, roles, role bindings, services, deployments etc..
2 gotk-sync.yaml - Configures Flux to reconcile with the source repository. It creates two resources:
3 kustomization.yaml - Flux will look in the path supplied in the flux-system kustomization for file named kustomization.yaml. These files are the entry point for Flux to understand what it should do with the folder contents. Essentially this file tells Flux to reconcile gotk-components.yaml and gotk-sync.yaml:
Now, there are a fair few concepts introduced here. Let’s run through an example of adding a new application to our source repository and have that sync down to the cluster. This will hopefully make things clearer. As kube-state-metrics is a service we commonly see installed to expose metrics to logging systems we’ll use it in the following example.
Example: kube-state-metrics (KSM)
In your fork there is a branch called feature/ksm. Go to the branches in Github:
领英推荐
Create a new pull request from the feature/ksm branch:
Ensure the pull request is pulling from feature/ksm into your forked base: main branch and not back to the BlakYaks source:
Click Create pull request then Merge pull request and Confirm merge on the next screen.
This will have pulled in the file and folder structure we will discuss in the rest of the example and, crucially, will also have deployed the resources to the cluster. The following sections 1 & 2 have been triggered by this pull request and Flux has reconciled the changes to the cluster. The sections describe exactly what we did.
1. ‘Before’ resource configuration
What are we trying to achieve here? We have configured Flux to:
When using Flux there are typically going to be some things you need to do before you can pull your applications down. In our example we need to create a metrics namespace in to which we will install KSM and we will also create the HelmRepository resource so that Flux knows where to pull the KSM Helm chart from.
The file and folder structure we created is (we will step through this shortly):
A git repo source was created by Terraform and is already being monitored by Flux. The gitrepo source object created by Terraform looks like this (some attributes removed for brevity):
The url tells you we are watching our forked repo and ref.branch tells you we are reconciling with the main branch. The interval attribute tells you we are checking the repo every minute for new commits. Changes made to this source repository in the correct manner will be reconciled by any clusters monitoring the folders.
It is probably worth taking a moment here to go over the files within the folder structure while defining their purpose. There is an element of building by convention with Flux which can get really confusing if you are not familiar with it. The example structure given in the official docs is:
├── apps
│ ├── base
│ ├── production
│ └── staging
├── infrastructure
│ ├── base
│ ├── production
│ └── staging
└── clusters
├── production
└── staging
With the description:
We have a simplified version of this in our forked repo:
├───apps
│ └───development
│ ├───before
│ │ ├───helmrepos
│ │ └───namespaces
│ └───kube-state-metrics
└───clusters
└───development
├───apps
└───flux-system
So, let’s go through it in detail:
1 before.yaml - The files under this directory can be thought of as the entry point which tells Flux what to reconcile and how often to do it. This file creates a Flux Kustomization called before-apps:
2 kustomization.yaml - This Kustomization is purely organisational and directs Flux to our helmrepos and namespaces folders:
3 kustomization.yaml - When Flux is directed to this folder it doesn't know what to do with the contents, so you have another kustomization file to instruct it what to do:
4 metrics.yaml - Standard K8s namespace yaml manifest:
5 kustomization.yaml - Same deal as no.3, lets Flux know what to do with the contents of this folder:
6 prometheus-community.yaml - Defines our Helm repository source so Flux is aware of the prometheus-community Helm chart repository:
We should be able to see our new before-apps kustomization (and kube-state-metrics, described in the next section) and be able to check the status:
Given that our kustomization is (hopefully) reporting that the revision is applied we should see our new metrics namespace:…and our new prometheus-community helmrepo source:
2. kube-state-metrics configuration
What are we trying to achieve here? We have configured Flux to:
We followed the same pattern for the folder structure as before. At the risk of labouring the point, although we will, because it’s really important; Let’s go through the files again:
1 kube-state-metrics.yaml - Our entry point, defining our source, path and interval:
2 kustomization.yaml - Simply directing Flux to our HelmRelease resource which will control our Helm release:
3 kube-state-metrics.helm.yaml - The resource which will actually deploy KSM to our cluster:
We should see our new kube-state-metrics kustomization with READY=True:
We should also see the helmrelease from file kube-state-metrics.helm.yaml with READY=True:
And our running kube-state-metrics deployment:
Now there is, admittedly, a lot to understand here. But we have to remember that what we are doing is configuring automation and typically the payoff for this is after you have set it all up. Let’s have a look at what we need to do to upgrade KSM to the latest version in the next section.
3. KSM Upgrade
From our fork we installed kube-state-metrcs pinned to chart v4.30.0:
The chart, in turn, has pinned the KSM image version to v2.8.0
Let’s upgrade the chart version in our fork; Change your kube-state-metrics.helm.yaml file so that the image version is now 5.0.0:
And commit this back to source control:
git add -A && git commit -m "Upgrading KSM chart to v5.0.0" && git push
After a minute or so the Kustomization will pick up the change and your chart version will be upgraded to 5.0.0:…and the running image version upgraded to v2.8.1:
Let’s demonstrate one more change. There is an experimental feature of KSM that will allow you to run it as a statefulset and have the data sharded across the pods in the set. We can set the values that are submitted to the Helm chart (as you would with a Helm values file) via the kube-state-metrics.helm.yaml file:
After updating; commit back to source control:
git add -A && git commit -m "Enabling KSM statefulset sharding" && git push
After a minute or so we can check our resources and we should see our new statefulset:
This is (hopefully) where the power of the GitOps model becomes apparent. A very simple change to a git repository was all it took to upgrade our service and, because it was done in source control, we have an audit trail and a mechanism to undo the changes by also simply changing the git commits.
In the real world we wouldn’t be committing directly to main source control branches the way we are here. The branches would have protection policies and approvers which mean only validated changes can be made to the clusters (maybe an article for another time).
Setting up notifications
The current documentation is a little sparse in places, so let’s look at an example of how we would setup an integration between our Flux deployment and Microsoft Teams. It’s probably worth an overview of the Notification controller at this point:
The controller handles all ingress and egress events for the Flux reconcilers; this includes receivers which can trigger events based on external system (incoming) webhooks, and providers, which interface with external systems. Alerts are defined that stipulate which events will trigger communication with specific providers.
Microsoft Teams allows integration with third party systems using incoming webhooks, so to link our Flux deployment to Teams, we have to create the following:
Once saved, apply the file into your cluster:
Within a few minutes, alerts should begin to appear within Teams:
You will probably notice that the example above is quite chatty. In a production environment you would definitely want to tune the alerts down to a more sensible level, but for demo purposes this should give you some messages to look at. Once you’ve seen enough, you can stop the alerts by removing the objects from Kubernetes: