Deploying Prometheus and Grafana over Amazon EKS and making their Data Persistent..
Khushi Thareja
Aspiring DevOps-Cloud Architect | RHCE v8 | 4x Redhat Certified | 3x Microsoft Certified
Why use your own kubernetes setup when Amazon is providing the entire multinode setup with a single click? What disadvantages we could face while using kubernetes setup on our local system? Why the monitoring team needs to worry about the resources like RAM, CPU, Storage when their main task is to monitor our resources? What if the pod with grafana running inside it goes down and our entire monitoring data is lost? This will result in a barrier in monitoring and taking required actions ins. So, in this article i present you a powerful setup which deploys prometheus and grafana over Amazon EKS and making its data persistent.
What are prometheus and grafana? Prometheus and Grafana are an open-source monitoring and alerting toolkit. Prometheus scrapes metrics and It stores all scraped samples locally and runs queries over this data to extract information from existing data or generate alerts. Grafana is a tool used to visualize the collected data.
What is Elastic Kubernetes Service (EKS)? Why is it better than our local setup of kubernetes? When we use the local setup of kubernetes we have a limited number of resources. For eg: if during a critical process our pod required resources more than what are present in our local system the pod would fail and we could face a huge loss. For which we setup the multi node cluster. Now, in multi node cluster we would have to setup the master as well as worker nodes on our own over our local system. For this requirement of ours Amazon has come up with a service known as EKS which would setup the entire multi node setup over their physical resources and would also provide highly skilled technical guys who would manage that for us.
Lets get started !!! Lets start with the image for prometheus and then jump to creating all the required manifest files. I created Dockerfile for the same.
FROM centos:7 RUN yum install wget -y RUN wget https://github.com/prometheus/prometheus/releases/download/v2.18.1/prometheus-2.18.1.linux-amd64.tar.gz RUN tar -xzf prometheus-2.18.1.linux-amd64.tar.gz ENTRYPOINT [ "./prometheus-2.18.1.linux-amd64/prometheus" ] CMD [ "--config.file='prometheus-2.18.1.linux-amd64/prometheus.yml'" ] EXPOSE 9090
You can use the same Docker Image from DockerHub using this command:
docker pull khushi09/prometheus:latest
Next we made all the manifest files for prometheus. Lets start with the Deployment file. Remember to give the labels carefully because labels are the one which would connect this pod to other resources like service and pvc. Also, since our target is to make the data persistent i used a pvc and a config map for the same. The pvc would be mounted on the folder /data because all the data is stored in this perticular folder. Also, used the config map concept to make the configuration file of prometheus persistent. Config file is the one where the information about the targets of prometheus id dtored beacuse of which this file is the most important one.
apiVersion: apps/v1 kind: Deployment metadata: name: prom-pod labels: env: prom-env spec: replicas: 2 selector: matchLabels: env: prom-env template: metadata: name: prom-pod labels: env: prom-env spec: containers: - name: prom-pod image: khushi09/prometheus:latest args: - "--config.file=prometheus-2.18.1.linux-amd64/prometheus.yml" ports: - containerPort: 9090 volumeMounts: - name: prometheus-persistent-storage mountPath: prometheus-2.18.1.linux-amd64/data - name: prometheus-config-volume mountPath: prometheus-2.18.1.linux-amd64/prometheus.yml subPath: prometheus.yml volumes: - name: prometheus-persistent-storage persistentVolumeClaim: claimName: prom-pvc - name: prometheus-config-volume configMap: name: prom-config
What is a ConfigMap? A ConfigMap is an API object used to store non-confidential data in key-value pairs. Pods can consume ConfigMaps as environment variables, command-line arguments, or as configuration files in a Volume. Now, lets have a look at the ConfigMap file. In the targets we have to specify the system IPs we want to monitor for the metrics.
kind: ConfigMap apiVersion: v1 metadata: name: prom-config data: prometheus.yml: | global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.. scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'node1' static_configs: - targets: ['192.168.0.107:9100'] - job_name: 'apache' static_configs: - targets: ['192.168.99.101:9117']
Next we go on to creating the service file. Here we have specified the type LoadBalancer. This would balance the load when the traffic increases and also provide an External IP through which we can connect the pod from the outer world. The Selector is important to specify and this is the name of the Deployment specified in the deployment file.
apiVersion: v1 kind: Service metadata: name: prom-service spec: ports: - port: 9090 selector: env: prom-env type: LoadBalancer
For storage we create PVC ie. persistent volume claim which woud make our data persistent and would not let the data delete even if our pod fails.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: prom-pvc labels: name: prom-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi
Next lets move to Grafana! And similarly create all the manifest files needed.
Lets start by creating the Docker image which is again uploaded on the DockerHub.
FROM centos:7 RUN yum install wget -y RUN wget https://dl.grafana.com/oss/release/grafana-7.0.1-1.x86_64.rpm RUN yum install grafana-7.0.1-1.x86_64.rpm -y WORKDIR /usr/share/grafana CMD [ "/usr/sbin/grafana-server", "cfg:default.paths.data=/var/lib/grafana", "--config=/etc/grafana/grafana.ini" ] EXPOSE 3000
You can use the same Docker Image from DockerHub using this command:
docker pull khushi09/grafana:v1
Deployment.yaml which will take care of updating the pods and in the background ReplicaSet does its job to maintain the desire of the number of pods.
apiVersion: apps/v1 kind: Deployment metadata: name: graf-pod labels: env: graf-env spec: replicas: 2 selector: matchLabels: env: graf-env template: metadata: name: graf-pod labels: env: graf-env spec: containers: - name: graf image: khushi09/grafana:v1 ports: - containerPort: 3000 volumeMounts: - name: grafana-persistent-storage mountPath: /var/lib/grafana volumes: - name: grafana-persistent-storage persistentVolumeClaim: claimName: graf-pvc
pvc.yaml persistent storage so that the dashboards that are prepared by the monitoring Team don’t get removed even if the pod gets corrupted.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: graf-pvc labels: app: visual spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi
service.yaml Again used the LoadBalancer so that it can be accessed by the outside world and load is balanced whenever it increases.
apiVersion: v1 kind: Service metadata: name: graf-service spec: ports: - port: 3000 selector: env: graf-env type: LoadBalancer
Now, we create a kustomization file. We just have to run the kustomization.yaml file and it will deploy everything for us.
apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - prom-pvc.yaml - prom-configmap.yaml - prom-deployment.yaml - prom-service.yaml - graf-pvc.yaml - graf-deployment.yaml - graf-service.yaml
You can also get all the files on GitHub https://github.com/khushi20218/eks-prom-graf
Lets start with the deployment of this infrastructure over the Amazon EKS !! For this you need to create an IAM account and provide the administrator access to it. You'll get an access key and secret key which you need to provide while logging into that account from the command line.
Now, run the aws configure command and provide the credentials.
Now, you need to start creating the cluster by creating the cluster config manifest file and running the eksctl command to create the same . eksctl command is an independent command which is very powerful in providing us the customization in creating the clusters. It uses CloudFormation to do the full setup. In the manifest file you need to specify the node groups, the number of nodes required, the type of instances you need etc.
apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: prom-graf-cluster region: ap-south-1 nodeGroups: - name: ng1 desiredCapacity: 4 instanceType: t2.micro ssh: publicKeyName: mykey
After this, run command eksctl create cluster -f cluster.yaml and your full setup is launched.
You can also verify the same through GUI. The cluster is being created and the instances are launched.
We also need to update our kube config file for which we would run the following command. This command would create a new config file if not present and also update if already present.
Now, the cluster has been launched. If we would run our prometheus and grafana files directly it would create and use the EBS volume for us. But there is a big disadvantage in using the EBS volumes as EBS service provided by amazon is region specific. If our pod during auto-scaling launched in some other region EBS would not be able to assist that and our purpose for persistent storage would not be satisfied becuase of which we move on to another service of Amazon which is EFS(elastic file system).
I'm using Web UI for this. AWS console -> EFS and then create one file system.
At the time of creating, provide the same VPC and security group which is given to your node by your EKS cluster so that they can connect to each other.
The file system has been created !! Lets move forward and some manifest files which would connect the file system with our cluster. EFS-provisioner.yaml The efs-provisioner allows you to mount EFS storage as PersistentVolumes in kubernetes. It consists of a container that has access to an AWS EFS resource. The container reads a configmap which contains the EFS filesystem ID, the AWS region and the name you want to use for your efs-provisioner. Do remember to change the file_system_id and the server name according to your file system !
kind: Deployment apiVersion: apps/v1 metadata: name: efs-provisioner spec: selector: matchLabels: app: efs-provisioner replicas: 1 strategy: type: Recreate template: metadata: labels: app: efs-provisioner spec: containers: - name: efs-provisioner image: quay.io/external_storage/efs-provisioner:v0.1.0 env: - name: FILE_SYSTEM_ID value: fs-8dbd375c - name: AWS_REGION value: ap-south-1 - name: PROVISIONER_NAME value: prom-graf/aws-efs volumeMounts: - name: pv-volume mountPath: /persistentvolumes volumes: - name: pv-volume nfs: server: fs-8dbd375c.efs.ap-south-1.amazonaws.com path: /
Command for running this, kubectl create -f create-efs-provisioner.yaml After this, we need to create one ClusterRoleBinding file too which helps in providing authorization to the efs-provisioner.
kind: ClusterRoleBinding metadata: name: nfs-provisioner-role-binding subjects: - kind: ServiceAccount name: default namespace: default roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io
Command for running this, kubectl create -f create-rbac.yaml After this, you can create your own storage class.
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"nfs-eks"},"volumeBindingMode":"WaitForFirstConsumer"} storageclass.kubernetes.io/is-default-class: "true" name: aws-efs provisioner: prom-graf/aws-efs
For this run, kubectl create -f create-storage.yaml. Now, when you'll run kubectl get sc , you'l observe 2 storage class. And now to make our storage class the default storage class either you can go inside the file and edit or delete the other storage class which will result in making your storage class the default one. Now our efs is integrated !! You can check this by describing the pvc.
Now, lets start deploying our prometheus and grafana setup over EKS. As you know we created the kustomization file. We just need to run the kustomization file and all the resources will be created.
Now, we are provided with the External IP which would help us access the prometheus and grafana server.
With the Prometheus URL we access the prometheus webpage and observe all the targets with we specified in the configMap yaml file come up. Therefore, our data remains persistent ie. whenever we want to add some new targets we add it in the configMap file. When we access grafana webpage the same way, for the first time we need to login with the admin account.
We login inside and create beutiful dashboards and save our dashboard to verify that our data remains persistent or not.
Now, we delete all the pods from the kubectl delete pods --all command . And as soon as we delete the pods kubernetes again launches all the pods.
Observe that the url as well as the pod name changed !! which proves that these are the new pods. Now, when you'll access this url , you'll find the same dashboards in grafana which you created earlier.
Observe that the dashboard is already created and they donot ask you to login again.
Since these were the real time graphs, they have changed from what you saw earlier. Remember that EKS is a paid service from Amazon. Also,It uses some of the services behind the scene which are paid too like providing a static IP ie. EIP service , Nat Gateways etc. When you are done, first delete the EFS manually and then use tyhe command eksctl delete cluster -f cluster-config.yaml, otherwise the security groups, vpc etc would conflict and would give an error in deleting the cluster.
Thats all !! Thanks for reading ! Do leave your valuable feedbacks . For any queries or correction feel free to contact.
think-make-do. Co Founder at SOCO - Share proof of work. Advocate Rajasthan High Court.
4 年Hey your work is really amazing why don't you start posting your work on thesocialcomment.com it is India's first student network.
M.tech(AI)@IIITL | Django Developer | Web Developer | Freelancer | DevOps & Cloud Expert | Blogger
4 年Nice explanation...??