登录查看更多内容

Deploying Prometheus and Grafana over Amazon EKS and making their Data Persistent..

Khushi Thareja

Aspiring DevOps-Cloud Architect | RHCE v8 | 4x Redhat Certified | 3x Microsoft Certified

发布日期: 2020年7月11日

Why use your own kubernetes setup when Amazon is providing the entire multinode setup with a single click? What disadvantages we could face while using kubernetes setup on our local system? Why the monitoring team needs to worry about the resources like RAM, CPU, Storage when their main task is to monitor our resources? What if the pod with grafana running inside it goes down and our entire monitoring data is lost? This will result in a barrier in monitoring and taking required actions ins. So, in this article i present you a powerful setup which deploys prometheus and grafana over Amazon EKS and making its data persistent.

What are prometheus and grafana? Prometheus and Grafana are an open-source monitoring and alerting toolkit. Prometheus scrapes metrics and It stores all scraped samples locally and runs queries over this data to extract information from existing data or generate alerts. Grafana is a tool used to visualize the collected data.

What is Elastic Kubernetes Service (EKS)? Why is it better than our local setup of kubernetes? When we use the local setup of kubernetes we have a limited number of resources. For eg: if during a critical process our pod required resources more than what are present in our local system the pod would fail and we could face a huge loss. For which we setup the multi node cluster. Now, in multi node cluster we would have to setup the master as well as worker nodes on our own over our local system. For this requirement of ours Amazon has come up with a service known as EKS which would setup the entire multi node setup over their physical resources and would also provide highly skilled technical guys who would manage that for us.

Lets get started !!! Lets start with the image for prometheus and then jump to creating all the required manifest files. I created Dockerfile for the same.

FROM centos:7
RUN yum install wget -y
RUN wget https://github.com/prometheus/prometheus/releases/download/v2.18.1/prometheus-2.18.1.linux-amd64.tar.gz
RUN tar -xzf prometheus-2.18.1.linux-amd64.tar.gz
ENTRYPOINT [ "./prometheus-2.18.1.linux-amd64/prometheus" ]
CMD [ "--config.file='prometheus-2.18.1.linux-amd64/prometheus.yml'" ]
EXPOSE 9090

You can use the same Docker Image from DockerHub using this command:

docker pull khushi09/prometheus:latest

Next we made all the manifest files for prometheus. Lets start with the Deployment file. Remember to give the labels carefully because labels are the one which would connect this pod to other resources like service and pvc. Also, since our target is to make the data persistent i used a pvc and a config map for the same. The pvc would be mounted on the folder /data because all the data is stored in this perticular folder. Also, used the config map concept to make the configuration file of prometheus persistent. Config file is the one where the information about the targets of prometheus id dtored beacuse of which this file is the most important one.

apiVersion: apps/v1
kind: Deployment
metadata: 
  name: prom-pod
  labels:
    env: prom-env
spec:
  replicas: 2
  selector:
    matchLabels:
      env: prom-env
  template:
    metadata:
      name: prom-pod
      labels: 
        env: prom-env
    spec:
      containers:
      - name: prom-pod
        image: khushi09/prometheus:latest
        args: 
        - "--config.file=prometheus-2.18.1.linux-amd64/prometheus.yml"
        ports:
        - containerPort: 9090
        volumeMounts:
        - name: prometheus-persistent-storage
          mountPath: prometheus-2.18.1.linux-amd64/data
        - name: prometheus-config-volume
          mountPath: prometheus-2.18.1.linux-amd64/prometheus.yml
          subPath: prometheus.yml
      volumes:
      - name: prometheus-persistent-storage
        persistentVolumeClaim:
          claimName: prom-pvc 
      - name: prometheus-config-volume
        configMap:
          name: prom-config

What is a ConfigMap? A ConfigMap is an API object used to store non-confidential data in key-value pairs. Pods can consume ConfigMaps as environment variables, command-line arguments, or as configuration files in a Volume. Now, lets have a look at the ConfigMap file. In the targets we have to specify the system IPs we want to monitor for the metrics.

kind: ConfigMap
apiVersion: v1
metadata:
  name: prom-config
data:
  prometheus.yml: |
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute..
      
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
        - targets: ['localhost:9090']
      - job_name: 'node1'
        static_configs:
        - targets: ['192.168.0.107:9100']
      - job_name: 'apache'
        static_configs:
        - targets: ['192.168.99.101:9117']

Next we go on to creating the service file. Here we have specified the type LoadBalancer. This would balance the load when the traffic increases and also provide an External IP through which we can connect the pod from the outer world. The Selector is important to specify and this is the name of the Deployment specified in the deployment file.

apiVersion: v1
kind: Service
metadata:
  name: prom-service
spec:
  ports:
    - port: 9090
  selector:
    env: prom-env
  type: LoadBalancer

For storage we create PVC ie. persistent volume claim which woud make our data persistent and would not let the data delete even if our pod fails.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prom-pvc
  labels:
    name: prom-pvc
spec:
  accessModes: 
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi

Next lets move to Grafana! And similarly create all the manifest files needed.

Lets start by creating the Docker image which is again uploaded on the DockerHub.

FROM centos:7
RUN yum install wget -y
RUN wget https://dl.grafana.com/oss/release/grafana-7.0.1-1.x86_64.rpm
RUN yum install grafana-7.0.1-1.x86_64.rpm -y
WORKDIR /usr/share/grafana
CMD [ "/usr/sbin/grafana-server", "cfg:default.paths.data=/var/lib/grafana", "--config=/etc/grafana/grafana.ini" ]
EXPOSE 3000

You can use the same Docker Image from DockerHub using this command:

docker pull khushi09/grafana:v1

Deployment.yaml which will take care of updating the pods and in the background ReplicaSet does its job to maintain the desire of the number of pods.

apiVersion: apps/v1
kind: Deployment
metadata: 
  name: graf-pod
  labels:
    env: graf-env
spec:
  replicas: 2
  selector:
    matchLabels:
      env: graf-env
  template:
    metadata:
      name: graf-pod
      labels: 
        env: graf-env
    spec:
      containers:
      - name: graf
        image: khushi09/grafana:v1
        ports:
        - containerPort: 3000
        volumeMounts:
        - name: grafana-persistent-storage
          mountPath: /var/lib/grafana
      volumes:
      - name: grafana-persistent-storage
        persistentVolumeClaim:
          claimName: graf-pvc

pvc.yaml persistent storage so that the dashboards that are prepared by the monitoring Team don’t get removed even if the pod gets corrupted.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: graf-pvc
  labels:
    app: visual
spec:
  accessModes: 
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi

service.yaml Again used the LoadBalancer so that it can be accessed by the outside world and load is balanced whenever it increases.

apiVersion: v1
kind: Service
metadata:
  name: graf-service
spec:
  ports:
    - port: 3000
  selector:
    env: graf-env
  type: LoadBalancer

Now, we create a kustomization file. We just have to run the kustomization.yaml file and it will deploy everything for us.

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - prom-pvc.yaml
  - prom-configmap.yaml
  - prom-deployment.yaml
  - prom-service.yaml
  - graf-pvc.yaml
  - graf-deployment.yaml
  - graf-service.yaml

You can also get all the files on GitHub https://github.com/khushi20218/eks-prom-graf

Lets start with the deployment of this infrastructure over the Amazon EKS !! For this you need to create an IAM account and provide the administrator access to it. You'll get an access key and secret key which you need to provide while logging into that account from the command line.

Now, run the aws configure command and provide the credentials.

Now, you need to start creating the cluster by creating the cluster config manifest file and running the eksctl command to create the same . eksctl command is an independent command which is very powerful in providing us the customization in creating the clusters. It uses CloudFormation to do the full setup. In the manifest file you need to specify the node groups, the number of nodes required, the type of instances you need etc.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata: 
  name: prom-graf-cluster
  region: ap-south-1
nodeGroups:
  - name: ng1
    desiredCapacity: 4
    instanceType: t2.micro
    ssh:
      publicKeyName: mykey

After this, run command eksctl create cluster -f cluster.yaml and your full setup is launched.

You can also verify the same through GUI. The cluster is being created and the instances are launched.

We also need to update our kube config file for which we would run the following command. This command would create a new config file if not present and also update if already present.

Now, the cluster has been launched. If we would run our prometheus and grafana files directly it would create and use the EBS volume for us. But there is a big disadvantage in using the EBS volumes as EBS service provided by amazon is region specific. If our pod during auto-scaling launched in some other region EBS would not be able to assist that and our purpose for persistent storage would not be satisfied becuase of which we move on to another service of Amazon which is EFS(elastic file system).

I'm using Web UI for this. AWS console -> EFS and then create one file system.

At the time of creating, provide the same VPC and security group which is given to your node by your EKS cluster so that they can connect to each other.

The file system has been created !! Lets move forward and some manifest files which would connect the file system with our cluster. EFS-provisioner.yaml The efs-provisioner allows you to mount EFS storage as PersistentVolumes in kubernetes. It consists of a container that has access to an AWS EFS resource. The container reads a configmap which contains the EFS filesystem ID, the AWS region and the name you want to use for your efs-provisioner. Do remember to change the file_system_id and the server name according to your file system !

kind: Deployment
apiVersion: apps/v1
metadata:
  name: efs-provisioner
spec:
  selector:
    matchLabels:
      app: efs-provisioner
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: efs-provisioner
    spec:
      containers:
        - name: efs-provisioner
          image: quay.io/external_storage/efs-provisioner:v0.1.0
          env:
            - name: FILE_SYSTEM_ID
              value: fs-8dbd375c
            - name: AWS_REGION
              value: ap-south-1
            - name: PROVISIONER_NAME
              value: prom-graf/aws-efs
          volumeMounts:
            - name: pv-volume
              mountPath: /persistentvolumes
      volumes:
        - name: pv-volume
          nfs:
            server: fs-8dbd375c.efs.ap-south-1.amazonaws.com
            path: /

Command for running this, kubectl create -f create-efs-provisioner.yaml After this, we need to create one ClusterRoleBinding file too which helps in providing authorization to the efs-provisioner.

kind: ClusterRoleBinding
metadata:
  name: nfs-provisioner-role-binding
subjects:
  - kind: ServiceAccount
    name: default
    namespace: default
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io

Command for running this, kubectl create -f create-rbac.yaml After this, you can create your own storage class.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"nfs-eks"},"volumeBindingMode":"WaitForFirstConsumer"}
    storageclass.kubernetes.io/is-default-class: "true"
  name: aws-efs
provisioner: prom-graf/aws-efs

For this run, kubectl create -f create-storage.yaml. Now, when you'll run kubectl get sc , you'l observe 2 storage class. And now to make our storage class the default storage class either you can go inside the file and edit or delete the other storage class which will result in making your storage class the default one. Now our efs is integrated !! You can check this by describing the pvc.

Now, lets start deploying our prometheus and grafana setup over EKS. As you know we created the kustomization file. We just need to run the kustomization file and all the resources will be created.

Now, we are provided with the External IP which would help us access the prometheus and grafana server.

With the Prometheus URL we access the prometheus webpage and observe all the targets with we specified in the configMap yaml file come up. Therefore, our data remains persistent ie. whenever we want to add some new targets we add it in the configMap file. When we access grafana webpage the same way, for the first time we need to login with the admin account.

We login inside and create beutiful dashboards and save our dashboard to verify that our data remains persistent or not.

Now, we delete all the pods from the kubectl delete pods --all command . And as soon as we delete the pods kubernetes again launches all the pods.

Observe that the url as well as the pod name changed !! which proves that these are the new pods. Now, when you'll access this url , you'll find the same dashboards in grafana which you created earlier.

Observe that the dashboard is already created and they donot ask you to login again.

Since these were the real time graphs, they have changed from what you saw earlier. Remember that EKS is a paid service from Amazon. Also,It uses some of the services behind the scene which are paid too like providing a static IP ie. EIP service , Nat Gateways etc. When you are done, first delete the EFS manually and then use tyhe command eksctl delete cluster -f cluster-config.yaml, otherwise the security groups, vpc etc would conflict and would give an error in deleting the cluster.

Thats all !! Thanks for reading ! Do leave your valuable feedbacks . For any queries or correction feel free to contact.

Gaurav Sharma

think-make-do. Co Founder at SOCO - Share proof of work. Advocate Rajasthan High Court.

4 年

Hey your work is really amazing why don't you start posting your work on thesocialcomment.com it is India's first student network.

1 次回应

Gaurav Gupta

4 年

Nice explanation...??

1 次回应

查看更多评论

要查看或添加评论，请登录

Khushi Thareja的更多文章

Automated website hosting using Terraform || EFS for persistent storage

2020年7月22日

Automated website hosting using Terraform || EFS for persistent storage

Terraform is a great tool to create cloud infrastructure as code. It surely beats clicking around the AWS GUI and…

2 条评论
Deployment of Website using Kubernetes.

2020年6月24日

Deployment of Website using Kubernetes.

Why use Kubernetes when the website can be deployed over a container using Docker? Kubernetes is a powerful tool which…

2 条评论
Automated System using Groovy | Jenkins | kubernetes

2020年6月22日

Automated System using Groovy | Jenkins | kubernetes

Another thing that is still manual is creating Jobs in Jenkins. And for this purpose the developer team is dependent on…

6 条评论
Website Deployment over the AWS cloud Automated using Terraform.

2020年6月14日

Website Deployment over the AWS cloud Automated using Terraform.

Cloud Computing is the reason that everybody around the world is able to access all the contents of the website without…
Rolling Updates with Zero Downtime..

2020年6月12日

Rolling Updates with Zero Downtime..

Why is it so important for big companies to roll their updates without even a single second of downtime ? Is it…
Integrating Jenkins | Git | Docker to create an Automated Environment

2020年6月12日

Integrating Jenkins | Git | Docker to create an Automated Environment

Jenkins is an open source automation tool written in Java with plugins built for Continuous Integration purpose…
Automation of Machine Learning model |Git | Jenkins | Docker

2020年5月24日

Automation of Machine Learning model |Git | Jenkins | Docker

In this task, automation of machine learning model is achieved. We automated the task of tuning the model to achieve…
Automated Testing & Production Environment | Jenkins | Docker

2020年5月20日

Automated Testing & Production Environment | Jenkins | Docker

In this task the most interesting thing to do was launching a docker container inside another docker container. We had…

See all articles

Deploying Prometheus and Grafana over Amazon EKS and making their Data Persistent..

Khushi Thareja

Aspiring DevOps-Cloud Architect | RHCE v8 | 4x Redhat Certified | 3x Microsoft Certified

Next lets move to Grafana! And similarly create all the manifest files needed.

Khushi Thareja的更多文章

社区洞察

其他会员也浏览了

Unlocking the Power of Observability with OpenTelemetry

RAFT Algorithm: Consensus in Distributed Systems

OSS Kubernetes and Container Storage Interface (CSI) drivers

Mastering Distributed Cache: A Blueprint for Scalability, Performance, and Availability

KEDA (Kubernetes-based Event Driven Autoscaler)

Streaming Metrics for Compute Observability with Kafka

How We Reimagined Data Storage

Open the Door to openEuler

GraalVM EE is Dead, Long Live Oracle GraalVM - JVM Weekly vol. 139

Next lets move to Grafana! And similarly create all the manifest files needed.

Khushi Thareja的更多文章

Automated website hosting using Terraform || EFS for persistent storage

Deployment of Website using Kubernetes.

Automated System using Groovy | Jenkins | kubernetes

Website Deployment over the AWS cloud Automated using Terraform.

Rolling Updates with Zero Downtime..

Integrating Jenkins | Git | Docker to create an Automated Environment

Automation of Machine Learning model |Git | Jenkins | Docker

Automated Testing & Production Environment | Jenkins | Docker

社区洞察

其他会员也浏览了

Unlocking the Power of Observability with OpenTelemetry

RAFT Algorithm: Consensus in Distributed Systems

OSS Kubernetes and Container Storage Interface (CSI) drivers

Mastering Distributed Cache: A Blueprint for Scalability, Performance, and Availability

KEDA (Kubernetes-based Event Driven Autoscaler)

Streaming Metrics for Compute Observability with Kafka

How We Reimagined Data Storage

Open the Door to openEuler

GraalVM EE is Dead, Long Live Oracle GraalVM - JVM Weekly vol. 139