Deploy RabbitMQ Cluster To K8s Cluster Via ArgoCD

Deploy RabbitMQ Cluster To K8s Cluster Via ArgoCD

Introduction?

This documentation illustrate how to install and configure RabbitMQ on Kubernetes cluster via ArgoCD.?

RabbitMQ is an open source message-broker software that originally implements the AMQP (Advanced Message Queuing Protocol) protocol, and while it has been developed and extended in order to support other protocols such as STOMP (Streaming Text Oriented Messaging Protocol) and MQTT (Message Queuing Telemetry Transport). It is a message-queueing software that offers support for sending and receiving messages between distributed systems, applications, and services. It's written with the Erlang programming language and supports client interfaces and libraries for all major programming languages including Python, NodeJS, Java, PHP etc.

Prerequisites

  • Access to a Kubernetes cluster version 1.18 or above
  • kubectl configured to access the cluster
  • ArgoCD bootstrapped into the k8s cluster (there is a directory in my GitHub repo that contains steps to install and configure ArgoCD in an already running Kubernetes cluster)

Quickstart Steps

This guide will walk you through the following steps:

  1. Install the RabbitMQ Cluster Operator
  2. Deploy a RabbitMQ Cluster using the Operator
  3. View RabbitMQ Logs
  4. Access the RabbitMQ Management UI
  5. Attach a Workload to the Cluster

1. Install the RabbitMQ Cluster Operator

The manifest for the installation of RabbitMQ Cluster Operator can be found in my GitHub repo ?and the deployment was done via ArgoCD. Installation of the Cluster Operator creates a bunch of Kubernetes resources. Breaking these down, we have:

  • a new namespace?rabbitmq-system. The Cluster Operator deployment is created in this namespace.

kubectl get all -n rabbitmq-system

NAME                                             READY   STATUS    RESTARTS   AGE
pod/rabbitmq-cluster-operator-5b4b795998-48mvp   1/1     Running   0          2m10s

NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/rabbitmq-cluster-operator   1/1     1            1           2m10s

NAME                                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/rabbitmq-cluster-operator-5b4b795998   1         1         1       2m10s        

  • a new custom resource?rabbitmqclusters.rabbitmq.com. The custom resource allows us to define an API for the creation of RabbitMQ Clusters.

kubectl get customresourcedefinitions.apiextensions.k8s.io

NAME                                             CREATED AT
...
rabbitmqclusters.rabbitmq.com                    2021-07-20T00:46:24Z
...        

  • and some rbac roles. These are required by the Operator to create, update and delete RabbitMQ Clusters.

It is important to note that the Cluster Operator must be created first before proceeding with the deployment of 3-node RabbitMQ Cluster. To achieve this, I had to comment out the part for the deployment of 3-node RabbitMQ Cluster in the kustomization.yaml file in the repo and allow ArgoCD to deploy the Cluster Operator to Kubernetes cluster.


2. Deploy a RabbitMQ Cluster using the Operator

Now that we have the Operator deployed, we are going to create 3-node RabbitMQ Cluster. The manifest for the deployment of 3-node RabbitMQ Cluster using the Cluster Operator is also located in GitHub repo and the deployment was also done via ArgoCD. All that is needed was to uncomment the part for the deployment of 3-node RabbitMQ Cluster in the kustomization.yaml file and allow ArgoCD to deploy the 3-node RabbitMQ Cluster?to the Kubernetes cluster.

In the RabbitmqCluster manifest, you will noticed that this is where we specified the name of our cluster and we have also used?storageClassName: rook-ceph-block. This is because this deployment is done on a self-managed Kubernetes cluster on premises that uses Rook-Ceph storage solution.

Everything else will be configured according to the Cluster Operator's defaults. That been said, we can override the default configurations (e.g. for StatefulSet and Service) by using the override parameter in RabbitmqCluster manifest.

This will create a RabbitMQ cluster called niyez-dev-rabbitmq (this is the name used in the RabbitmqCluster manifest) in the rabbitmq-system namespace. You can see the RabbitMQ Cluster as it is being created:

watch kubectl get all -n rabbitmq-system 

NAME                                             READY   STATUS    RESTARTS   AGE
pod/niyez-dev-rabbitmq-server-0                1/1     Running   0          2d17h
pod/niyez-dev-rabbitmq-server-1                1/1     Running   0          2d17h
pod/niyez-dev-rabbitmq-server-2                1/1     Running   0          2d17h

NAME                                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                        AGE
service/niyez-dev-rabbitmq         ClusterIP   10.8.2.45    <none>        5672/TCP,15672/TCP,15692/TCP   2d17h
service/niyez-dev-rabbitmq-nodes   ClusterIP   None         <none>        4369/TCP,25672/TCP             2d17h

NAME                                           READY   AGE
statefulset.apps/niyez-dev-rabbitmq-server     3/3     2d17h        

You will also be able to see an instance of the?rabbitmqclusters.rabbitmq.com?custom resource created.

kubectl get rabbitmqclusters.rabbitmq.com -n rabbitmq-system 
NAME                   ALLREPLICASREADY   RECONCILESUCCESS   AGE
niyez-dev-rabbitmq     True               True               2d18h        

If your Pod is stuck in the Pending state, most probably your cluster does not have sufficient resources (memory and/or CPU). This can be verified as the following:

kubectl describe rabbitmqclusters.rabbitmq.com -n rabbitmq-system
...
    Limits:
      cpu:     2000m
      memory:  2Gi
    Requests:
      cpu:      1000m
      memory:   2Gi
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  2s (x19 over 20m)  default-scheduler  0/3 nodes are available: 1 Insufficient memory, 3 Insufficient cpu.
...        

In this case, and since this is a fresh deployment, you may need to adjust the resource requests and limits in the RabbitmqCluster manifest file. After that, you need to remove and re-create the previously created RabbitMQ Cluster object by commenting out and uncomment the part for the deployment of 3-node RabbitMQ Cluster in the repo.

Specify the resource requests and limits of the RabbitmqCluster Pods. CPU requirements must be in CPU units. Memory requirements must be in bytes. Both values must be expressed as a Kubernetes resource quantity. The RabbitMQCluster does not deploy if these configurations are provided but not valid.
Default Values:
Memory limit: 2 Gi
CPU limit: 2000 millicores
Memory request: 2 Gi
CPU request: 1000 millicores

The RabbitMQ high-water mark is set to 0.4 times the memory limit.?It is recommended to keep the memory requests and limits as the same value.

By default, RabbitMQ will not accept any new messages when it detects that it's using more than 40% of the available memory (as reported by the OS):?vm_memory_high_watermark.relative = 0.4. This is a safe default and care should be taken when modifying this value, even when the host is a dedicated RabbitMQ node.

The OS and file system use system memory to speed up operations for all system processes. Failing to leave enough free system memory for this purpose will have an adverse effect on system performance due to OS swapping, and can even result in RabbitMQ process termination.

Also, it is possible for the pods to be running but not ready and when you checked, you may see something similar to below:

kubectl describe pod niyez-dev-rabbitmq-server-0 -n rabbitmq-system
...
Readiness probe failed: dial tcp 10.4.2.20:5672: connect: connection refused
...        

In this case you may want to wait for a number of retries (five minutes by default).

Alternatively, this could be solved by increasing the initial delay in the readiness check and using basic RabbitMQ health check for readiness probe.

  readinessProbe: # probe to know when RMQ is ready to accept traffic
   exec:
     # This is just an example. There is no "one true health check" but rather
     # several rabbitmq-diagnostics commands that can be combined to form increasingly comprehensive
     # and intrusive health checks.
     # Learn more at https://www.rabbitmq.com/monitoring.html#health-checks.
     # Stage 1 check:
     command: ["rabbitmq-diagnostics", "ping"]
   initialDelaySeconds: 20
   periodSeconds: 60
   timeoutSeconds: 10        
Kubernetes uses a check known as the?readiness probe ?to determine if a pod is ready to serve client traffic. This is effectively a specialized?health check ?defined by the system operator.

When an?ordered pod deployment policy ?is used — and this is the recommended option for RabbitMQ clusters — the probe controls when the Kubernetes controller will consider the currently deployed pod to be ready and proceed to deploy the next one. This check, if not chosen appropriately, can deadlock a rolling cluster node restart.

RabbitMQ nodes that belong to a cluster will?attempt to sync schema from their peers on startup . If no peer comes online within a configurable time window (five minutes by default), the node will give up and voluntarily stop. Before the sync is complete, the node won’t mark itself as fully booted.

Therefore, if a readiness probe assumes that a node is fully booted and running,?a rolling restart of RabbitMQ node pods using such probe will deadlock: the probe will never succeed, and will never proceed to deploy the next pod, which must come online for the original pod to be considered ready by the deployment.

It is therefore recommended to use a very basic RabbitMQ health check for readiness probe: rabbitmq-diagnostics ping


3. View RabbitMQ Logs

In order to make sure RabbitMQ has started correctly, let's view the RabbitMQ log file. This can be done by viewing the RabbitMQ pod logs. In this case, it would be:

kubectl logs niyez-dev-rabbitmq-server-0 -n rabbitmq-system         

You should see an output similar to the one below:

WARNING: 'docker-entrypoint.sh' generated/modified the RabbitMQ configuration file, which will no longer happen in a future release! (https://github.com/docker-library/rabbitmq/pull/424)

Generated end result, for reference:
------------------------------------
loopback_users.guest = false
total_memory_available_override_value = 524288000
listeners.tcp.default = 5672
management.tcp.port = 15672
------------------------------------
Configuring logger redirection
01:05:25.248 [warning] cluster_formation.randomized_startup_delay_range.min and cluster_formation.randomized_startup_delay_range.max are deprecated
2021-07-21 01:05:51.252 [debug] <0.291.0> Lager installed handler error_logger_lager_h into error_logger
2021-07-21 01:05:51.253 [debug] <0.294.0> Lager installed handler lager_forwarder_backend into error_logger_lager_event
2021-07-21 01:05:51.257 [debug] <0.297.0> Lager installed handler lager_forwarder_backend into rabbit_log_lager_event
2021-07-21 01:05:51.350 [debug] <0.300.0> Lager installed handler lager_forwarder_backend into rabbit_log_channel_lager_event
2021-07-21 01:05:51.357 [debug] <0.303.0> Lager installed handler lager_forwarder_backend into rabbit_log_connection_lager_event
2021-07-21 01:05:51.452 [debug] <0.306.0> Lager installed handler lager_forwarder_backend into rabbit_log_feature_flags_lager_event
2021-07-21 01:05:51.549 [debug] <0.309.0> Lager installed handler lager_forwarder_backend into rabbit_log_federation_lager_event
2021-07-21 01:05:51.555 [debug] <0.312.0> Lager installed handler lager_forwarder_backend into rabbit_log_ldap_lager_event
2021-07-21 01:05:51.649 [debug] <0.315.0> Lager installed handler lager_forwarder_backend into rabbit_log_mirroring_lager_event
2021-07-21 01:05:51.656 [debug] <0.318.0> Lager installed handler lager_forwarder_backend into rabbit_log_prelaunch_lager_event
2021-07-21 01:05:51.749 [debug] <0.287.0> Lager installed handler lager_backend_throttle into lager_event
2021-07-21 01:05:51.751 [debug] <0.321.0> Lager installed handler lager_forwarder_backend into rabbit_log_queue_lager_event
2021-07-21 01:05:51.757 [debug] <0.324.0> Lager installed handler lager_forwarder_backend into rabbit_log_ra_lager_event
2021-07-21 01:05:51.852 [debug] <0.327.0> Lager installed handler lager_forwarder_backend into rabbit_log_shovel_lager_event
2021-07-21 01:05:51.858 [debug] <0.330.0> Lager installed handler lager_forwarder_backend into rabbit_log_upgrade_lager_event
2021-07-21 01:05:52.656 [info] <0.44.0> Application lager started on node 'rabbit@niyez-dev-rabbitmq-server-0.niyez-dev-rabbitmq-nodes.rabbitmq-system'
2021-07-21 01:05:55.850 [info] <0.44.0> Application mnesia started on node 'rabbit@niyez-dev-rabbitmq-server-0.niyez-dev-rabbitmq-nodes.rabbitmq-system'
2021-07-21 01:05:55.851 [info] <0.273.0> 
 Starting RabbitMQ 3.8.18 on Erlang 24.0.3 [jit]
 Copyright (c) 2007-2021 VMware, Inc. or its affiliates.
 Licensed under the MPL 2.0. Website: https://rabbitmq.com

  ##  ##      RabbitMQ 3.8.18
  ##  ##
  ##########  Copyright (c) 2007-2021 VMware, Inc. or its affiliates.
  ######  ##
  ##########  Licensed under the MPL 2.0. Website: https://rabbitmq.com

  Erlang:      24.0.3 [jit]
  TLS Library: OpenSSL - OpenSSL 1.1.1k  25 Mar 2021

  Doc guides:  https://rabbitmq.com/documentation.html
  Support:     https://rabbitmq.com/contact.html
  Tutorials:   https://rabbitmq.com/getstarted.html
  Monitoring:  https://rabbitmq.com/monitoring.html

...        

4. Access The Management UI

Next, let's access the Management UI.

username="$(kubectl get secret -n rabbitmq-system niyez-dev-rabbitmq-default-user -o jsonpath='{.data.username}' | base64 --decode)"
echo "username: $username"
password="$(kubectl get secret -n rabbitmq-system niyez-dev-rabbitmq-default-user  -o jsonpath='{.data.password}' | base64 --decode)"
echo "password: $password"        

Open a new terminal on your local system and run the commands below:

gcloud beta compute ssh pzukprsvqjb01 --tunnel-through-iap --zone europe-west4-a --project niyez -- -L 15672:localhost:15672

kubectl port-forward svc/niyez-dev-rabbitmq 15672:15672 -n rabbitmq-system        
The gcloud command above was used to do ssh port forwarding to a jumpbox connected to the k8s cluster so that I can access RabbitMQ UI from my local system

Now we can open localhost:15672 in the browser and see the Management UI. The credentials are as printed in the commands above.

No alt text provided for this image

Alternatively, you can run a curl command to verify access:

curl -u$username:$password localhost:15672/api/overview
{"management_version":"3.8.18","rates_mode":"basic", ...}        

Using the kubectl rabbitmq plugin, the Management UI can be accessed using:

kubectl rabbitmq manage niyez-dev-rabbitmq        

5. Attach a Workload to the Cluster (Connect An Application To The Cluster)

The next step would be to connect an application to the RabbitMQ Cluster in order to use its messaging capabilities. The?perf-test ?application is frequently used within the RabbitMQ community for load testing RabbitMQ Clusters.

Here, we will be using the niyez-dev-rabbitmq service to find the connection address, and the niyez-dev-rabbitmq-default-user to find connection credentials.

username="$(kubectl get secret -n rabbitmq-system niyez-dev-rabbitmq-default-user -o jsonpath='{.data.username}' | base64 --decode)"
password="$(kubectl get secret -n rabbitmq-system niyez-dev-rabbitmq-default-user  -o jsonpath='{.data.password}' | base64 --decode)"
service="$(kubectl get service -n rabbitmq-system niyez-dev-rabbitmq -o jsonpath='{.spec.clusterIP}')"
kubectl run perf-test --image=pivotalrabbitmq/perf-test --namespace=rabbitmq-system  -- --uri amqp://$username:$password@$service 

# pod/perf-test created        

These steps are automated in the kubectl rabbitmq plugin which may simply be run as:

kubectl rabbitmq perf-test niyez-dev-rabbitmq -n rabbitmq-system        

We can now view the perf-test logs by running the command below:

kubectl logs --follow perf-test -n rabbitmq-system         

You should see an output similar to the one below:

id: test-175031-307, starting consumer #0
id: test-175031-307, starting consumer #0, channel #0
id: test-175031-307, starting producer #0
id: test-175031-307, starting producer #0, channel #0
id: test-175031-307, time: 1.028s, sent: 3493 msg/s, received: 715 msg/s, min/median/75th/95th/99th consumer latency: 33544/212123/292764/359305/365207 μs
id: test-175031-307, time: 2.028s, sent: 7043 msg/s, received: 1317 msg/s, min/median/75th/95th/99th consumer latency: 499952/878971/1175757/1376007/1380011 μs
id: test-175031-307, time: 3.113s, sent: 1181 msg/s, received: 1264 msg/s, min/median/75th/95th/99th consumer latency: 1397108/1776888/2076450/2276947/2280045 μs
id: test-175031-307, time: 4.114s, sent: 1336 msg/s, received: 1308 msg/s, min/median/75th/95th/99th consumer latency: 2139700/2555824/2741999/2934838/2935774 μs
id: test-175031-307, time: 5.114s, sent: 1283 msg/s, received: 1507 msg/s, min/median/75th/95th/99th consumer latency: 3035346/3480274/3613550/3804079/3809466 μs
id: test-175031-307, time: 6.114s, sent: 2565 msg/s, received: 1503 msg/s, min/median/75th/95th/99th consumer latency: 3694689/4092083/4292815/4432469/4433152 μs
id: test-175031-307, time: 7.115s, sent: 0 msg/s, received: 1291 msg/s, min/median/75th/95th/99th consumer latency: 4522260/4949936/5150542/5344124/5346090 μs
id: test-175031-307, time: 8.115s, sent: 689 msg/s, received: 1206 msg/s, min/median/75th/95th/99th consumer latency: 5450762/5770413/5969120/6260902/6266098 μs
id: test-175031-307, time: 9.119s, sent: 861 msg/s, received: 537 msg/s, min/median/75th/95th/99th consumer latency: 6372845/6571241/6759990/7065929/7066133 μs
id: test-175031-307, time: 10.121s, sent: 862 msg/s, received: 511 msg/s, min/median/75th/95th/99th consumer latency: 6681663/6978422/7384080/7479825/7577313 μs
...        

As can be seen, perf-test is able to produce and consume about 1,300 messages per second.

This can be seen in the rabbitmq management UI as shown below:

No alt text provided for this image

Installing kubectl rabbitmq plugin

The kubectl rabbitmq plugin provides commands for managing RabbitMQ cluster and can be installed by using?krew

kubectl krew install rabbitmq        
You may see the below warning after installing rabbitmq plugin, proceed with your cluster testing and uninstall the plugin after the testing. Otherwise, you can stick to the manual testing above.
WARNING: You installed plugin "rabbitmq" from the krew-index plugin repository.
   These plugins are not audited for security by the Krew maintainers.
   Run them at your own risk.        

References:


Michael Whatmuff

Platform Engineer at Civica

3 个月

Don't suppose you've found a way to express the dependency on the operator since have you?

回复
Thomas Decaux

Experienced Senior Platform Engineer - SRE | Kubernetes ?? | Terraform ??? | Spark ? | Elasticsearch ??

1 年

I did a very useful helm chart to deploy Cluster and provision Queues etc... https://github.com/datatok/helm-charts/tree/main/charts/rabbitmq

You are a big boy now, My boss. Love to see how you have grown ??

Olayinka Olasimbo

Observability Devops Engineer | Cloud Infrastructure Engineer | Platform Engineer | AWS

2 年

Awesome ??

Adenike Odeleye

DevOps Enthusiast | Azure | GCP | Linux | Python

2 年

Good one! Fully loaded. ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了