Deploy RabbitMQ Cluster To K8s Cluster Via ArgoCD
Olaniyi Odeleye (MBA)
Cloud Operations Engineer @ DigitalOcean | MBA, DevOps, Kubernetes, Cloud-Native Infrastructure, Terraform, Argo Workflow, ArgoCD
Introduction?
This documentation illustrate how to install and configure RabbitMQ on Kubernetes cluster via ArgoCD.?
RabbitMQ is an open source message-broker software that originally implements the AMQP (Advanced Message Queuing Protocol) protocol, and while it has been developed and extended in order to support other protocols such as STOMP (Streaming Text Oriented Messaging Protocol) and MQTT (Message Queuing Telemetry Transport). It is a message-queueing software that offers support for sending and receiving messages between distributed systems, applications, and services. It's written with the Erlang programming language and supports client interfaces and libraries for all major programming languages including Python, NodeJS, Java, PHP etc.
Prerequisites
Quickstart Steps
This guide will walk you through the following steps:
1. Install the RabbitMQ Cluster Operator
The manifest for the installation of RabbitMQ Cluster Operator can be found in my GitHub repo ?and the deployment was done via ArgoCD. Installation of the Cluster Operator creates a bunch of Kubernetes resources. Breaking these down, we have:
kubectl get all -n rabbitmq-system
NAME READY STATUS RESTARTS AGE
pod/rabbitmq-cluster-operator-5b4b795998-48mvp 1/1 Running 0 2m10s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/rabbitmq-cluster-operator 1/1 1 1 2m10s
NAME DESIRED CURRENT READY AGE
replicaset.apps/rabbitmq-cluster-operator-5b4b795998 1 1 1 2m10s
kubectl get customresourcedefinitions.apiextensions.k8s.io
NAME CREATED AT
...
rabbitmqclusters.rabbitmq.com 2021-07-20T00:46:24Z
...
It is important to note that the Cluster Operator must be created first before proceeding with the deployment of 3-node RabbitMQ Cluster. To achieve this, I had to comment out the part for the deployment of 3-node RabbitMQ Cluster in the kustomization.yaml file in the repo and allow ArgoCD to deploy the Cluster Operator to Kubernetes cluster.
2. Deploy a RabbitMQ Cluster using the Operator
Now that we have the Operator deployed, we are going to create 3-node RabbitMQ Cluster. The manifest for the deployment of 3-node RabbitMQ Cluster using the Cluster Operator is also located in GitHub repo and the deployment was also done via ArgoCD. All that is needed was to uncomment the part for the deployment of 3-node RabbitMQ Cluster in the kustomization.yaml file and allow ArgoCD to deploy the 3-node RabbitMQ Cluster?to the Kubernetes cluster.
In the RabbitmqCluster manifest, you will noticed that this is where we specified the name of our cluster and we have also used?storageClassName: rook-ceph-block. This is because this deployment is done on a self-managed Kubernetes cluster on premises that uses Rook-Ceph storage solution.
Everything else will be configured according to the Cluster Operator's defaults. That been said, we can override the default configurations (e.g. for StatefulSet and Service) by using the override parameter in RabbitmqCluster manifest.
This will create a RabbitMQ cluster called niyez-dev-rabbitmq (this is the name used in the RabbitmqCluster manifest) in the rabbitmq-system namespace. You can see the RabbitMQ Cluster as it is being created:
watch kubectl get all -n rabbitmq-system
NAME READY STATUS RESTARTS AGE
pod/niyez-dev-rabbitmq-server-0 1/1 Running 0 2d17h
pod/niyez-dev-rabbitmq-server-1 1/1 Running 0 2d17h
pod/niyez-dev-rabbitmq-server-2 1/1 Running 0 2d17h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/niyez-dev-rabbitmq ClusterIP 10.8.2.45 <none> 5672/TCP,15672/TCP,15692/TCP 2d17h
service/niyez-dev-rabbitmq-nodes ClusterIP None <none> 4369/TCP,25672/TCP 2d17h
NAME READY AGE
statefulset.apps/niyez-dev-rabbitmq-server 3/3 2d17h
You will also be able to see an instance of the?rabbitmqclusters.rabbitmq.com?custom resource created.
kubectl get rabbitmqclusters.rabbitmq.com -n rabbitmq-system
NAME ALLREPLICASREADY RECONCILESUCCESS AGE
niyez-dev-rabbitmq True True 2d18h
If your Pod is stuck in the Pending state, most probably your cluster does not have sufficient resources (memory and/or CPU). This can be verified as the following:
kubectl describe rabbitmqclusters.rabbitmq.com -n rabbitmq-system
...
Limits:
cpu: 2000m
memory: 2Gi
Requests:
cpu: 1000m
memory: 2Gi
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2s (x19 over 20m) default-scheduler 0/3 nodes are available: 1 Insufficient memory, 3 Insufficient cpu.
...
In this case, and since this is a fresh deployment, you may need to adjust the resource requests and limits in the RabbitmqCluster manifest file. After that, you need to remove and re-create the previously created RabbitMQ Cluster object by commenting out and uncomment the part for the deployment of 3-node RabbitMQ Cluster in the repo.
Specify the resource requests and limits of the RabbitmqCluster Pods. CPU requirements must be in CPU units. Memory requirements must be in bytes. Both values must be expressed as a Kubernetes resource quantity. The RabbitMQCluster does not deploy if these configurations are provided but not valid.
Default Values:
Memory limit: 2 Gi
CPU limit: 2000 millicores
Memory request: 2 Gi
CPU request: 1000 millicores
The RabbitMQ high-water mark is set to 0.4 times the memory limit.?It is recommended to keep the memory requests and limits as the same value.
By default, RabbitMQ will not accept any new messages when it detects that it's using more than 40% of the available memory (as reported by the OS):?vm_memory_high_watermark.relative = 0.4. This is a safe default and care should be taken when modifying this value, even when the host is a dedicated RabbitMQ node.
The OS and file system use system memory to speed up operations for all system processes. Failing to leave enough free system memory for this purpose will have an adverse effect on system performance due to OS swapping, and can even result in RabbitMQ process termination.
Also, it is possible for the pods to be running but not ready and when you checked, you may see something similar to below:
kubectl describe pod niyez-dev-rabbitmq-server-0 -n rabbitmq-system
...
Readiness probe failed: dial tcp 10.4.2.20:5672: connect: connection refused
...
In this case you may want to wait for a number of retries (five minutes by default).
Alternatively, this could be solved by increasing the initial delay in the readiness check and using basic RabbitMQ health check for readiness probe.
readinessProbe: # probe to know when RMQ is ready to accept traffic
exec:
# This is just an example. There is no "one true health check" but rather
# several rabbitmq-diagnostics commands that can be combined to form increasingly comprehensive
# and intrusive health checks.
# Learn more at https://www.rabbitmq.com/monitoring.html#health-checks.
# Stage 1 check:
command: ["rabbitmq-diagnostics", "ping"]
initialDelaySeconds: 20
periodSeconds: 60
timeoutSeconds: 10
Kubernetes uses a check known as the?readiness probe ?to determine if a pod is ready to serve client traffic. This is effectively a specialized?health check ?defined by the system operator.
When an?ordered pod deployment policy ?is used — and this is the recommended option for RabbitMQ clusters — the probe controls when the Kubernetes controller will consider the currently deployed pod to be ready and proceed to deploy the next one. This check, if not chosen appropriately, can deadlock a rolling cluster node restart.
RabbitMQ nodes that belong to a cluster will?attempt to sync schema from their peers on startup . If no peer comes online within a configurable time window (five minutes by default), the node will give up and voluntarily stop. Before the sync is complete, the node won’t mark itself as fully booted.
Therefore, if a readiness probe assumes that a node is fully booted and running,?a rolling restart of RabbitMQ node pods using such probe will deadlock: the probe will never succeed, and will never proceed to deploy the next pod, which must come online for the original pod to be considered ready by the deployment.
It is therefore recommended to use a very basic RabbitMQ health check for readiness probe: rabbitmq-diagnostics ping
3. View RabbitMQ Logs
In order to make sure RabbitMQ has started correctly, let's view the RabbitMQ log file. This can be done by viewing the RabbitMQ pod logs. In this case, it would be:
kubectl logs niyez-dev-rabbitmq-server-0 -n rabbitmq-system
You should see an output similar to the one below:
WARNING: 'docker-entrypoint.sh' generated/modified the RabbitMQ configuration file, which will no longer happen in a future release! (https://github.com/docker-library/rabbitmq/pull/424)
Generated end result, for reference:
------------------------------------
loopback_users.guest = false
total_memory_available_override_value = 524288000
listeners.tcp.default = 5672
management.tcp.port = 15672
------------------------------------
Configuring logger redirection
01:05:25.248 [warning] cluster_formation.randomized_startup_delay_range.min and cluster_formation.randomized_startup_delay_range.max are deprecated
2021-07-21 01:05:51.252 [debug] <0.291.0> Lager installed handler error_logger_lager_h into error_logger
2021-07-21 01:05:51.253 [debug] <0.294.0> Lager installed handler lager_forwarder_backend into error_logger_lager_event
2021-07-21 01:05:51.257 [debug] <0.297.0> Lager installed handler lager_forwarder_backend into rabbit_log_lager_event
2021-07-21 01:05:51.350 [debug] <0.300.0> Lager installed handler lager_forwarder_backend into rabbit_log_channel_lager_event
2021-07-21 01:05:51.357 [debug] <0.303.0> Lager installed handler lager_forwarder_backend into rabbit_log_connection_lager_event
2021-07-21 01:05:51.452 [debug] <0.306.0> Lager installed handler lager_forwarder_backend into rabbit_log_feature_flags_lager_event
2021-07-21 01:05:51.549 [debug] <0.309.0> Lager installed handler lager_forwarder_backend into rabbit_log_federation_lager_event
2021-07-21 01:05:51.555 [debug] <0.312.0> Lager installed handler lager_forwarder_backend into rabbit_log_ldap_lager_event
2021-07-21 01:05:51.649 [debug] <0.315.0> Lager installed handler lager_forwarder_backend into rabbit_log_mirroring_lager_event
2021-07-21 01:05:51.656 [debug] <0.318.0> Lager installed handler lager_forwarder_backend into rabbit_log_prelaunch_lager_event
2021-07-21 01:05:51.749 [debug] <0.287.0> Lager installed handler lager_backend_throttle into lager_event
2021-07-21 01:05:51.751 [debug] <0.321.0> Lager installed handler lager_forwarder_backend into rabbit_log_queue_lager_event
2021-07-21 01:05:51.757 [debug] <0.324.0> Lager installed handler lager_forwarder_backend into rabbit_log_ra_lager_event
2021-07-21 01:05:51.852 [debug] <0.327.0> Lager installed handler lager_forwarder_backend into rabbit_log_shovel_lager_event
2021-07-21 01:05:51.858 [debug] <0.330.0> Lager installed handler lager_forwarder_backend into rabbit_log_upgrade_lager_event
2021-07-21 01:05:52.656 [info] <0.44.0> Application lager started on node 'rabbit@niyez-dev-rabbitmq-server-0.niyez-dev-rabbitmq-nodes.rabbitmq-system'
2021-07-21 01:05:55.850 [info] <0.44.0> Application mnesia started on node 'rabbit@niyez-dev-rabbitmq-server-0.niyez-dev-rabbitmq-nodes.rabbitmq-system'
2021-07-21 01:05:55.851 [info] <0.273.0>
Starting RabbitMQ 3.8.18 on Erlang 24.0.3 [jit]
Copyright (c) 2007-2021 VMware, Inc. or its affiliates.
Licensed under the MPL 2.0. Website: https://rabbitmq.com
## ## RabbitMQ 3.8.18
## ##
########## Copyright (c) 2007-2021 VMware, Inc. or its affiliates.
###### ##
########## Licensed under the MPL 2.0. Website: https://rabbitmq.com
Erlang: 24.0.3 [jit]
TLS Library: OpenSSL - OpenSSL 1.1.1k 25 Mar 2021
Doc guides: https://rabbitmq.com/documentation.html
Support: https://rabbitmq.com/contact.html
Tutorials: https://rabbitmq.com/getstarted.html
Monitoring: https://rabbitmq.com/monitoring.html
...
4. Access The Management UI
Next, let's access the Management UI.
username="$(kubectl get secret -n rabbitmq-system niyez-dev-rabbitmq-default-user -o jsonpath='{.data.username}' | base64 --decode)"
echo "username: $username"
password="$(kubectl get secret -n rabbitmq-system niyez-dev-rabbitmq-default-user -o jsonpath='{.data.password}' | base64 --decode)"
echo "password: $password"
Open a new terminal on your local system and run the commands below:
gcloud beta compute ssh pzukprsvqjb01 --tunnel-through-iap --zone europe-west4-a --project niyez -- -L 15672:localhost:15672
kubectl port-forward svc/niyez-dev-rabbitmq 15672:15672 -n rabbitmq-system
The gcloud command above was used to do ssh port forwarding to a jumpbox connected to the k8s cluster so that I can access RabbitMQ UI from my local system
Now we can open localhost:15672 in the browser and see the Management UI. The credentials are as printed in the commands above.
Alternatively, you can run a curl command to verify access:
curl -u$username:$password localhost:15672/api/overview
{"management_version":"3.8.18","rates_mode":"basic", ...}
Using the kubectl rabbitmq plugin, the Management UI can be accessed using:
kubectl rabbitmq manage niyez-dev-rabbitmq
5. Attach a Workload to the Cluster (Connect An Application To The Cluster)
The next step would be to connect an application to the RabbitMQ Cluster in order to use its messaging capabilities. The?perf-test ?application is frequently used within the RabbitMQ community for load testing RabbitMQ Clusters.
Here, we will be using the niyez-dev-rabbitmq service to find the connection address, and the niyez-dev-rabbitmq-default-user to find connection credentials.
username="$(kubectl get secret -n rabbitmq-system niyez-dev-rabbitmq-default-user -o jsonpath='{.data.username}' | base64 --decode)"
password="$(kubectl get secret -n rabbitmq-system niyez-dev-rabbitmq-default-user -o jsonpath='{.data.password}' | base64 --decode)"
service="$(kubectl get service -n rabbitmq-system niyez-dev-rabbitmq -o jsonpath='{.spec.clusterIP}')"
kubectl run perf-test --image=pivotalrabbitmq/perf-test --namespace=rabbitmq-system -- --uri amqp://$username:$password@$service
# pod/perf-test created
These steps are automated in the kubectl rabbitmq plugin which may simply be run as:
kubectl rabbitmq perf-test niyez-dev-rabbitmq -n rabbitmq-system
We can now view the perf-test logs by running the command below:
kubectl logs --follow perf-test -n rabbitmq-system
You should see an output similar to the one below:
id: test-175031-307, starting consumer #0
id: test-175031-307, starting consumer #0, channel #0
id: test-175031-307, starting producer #0
id: test-175031-307, starting producer #0, channel #0
id: test-175031-307, time: 1.028s, sent: 3493 msg/s, received: 715 msg/s, min/median/75th/95th/99th consumer latency: 33544/212123/292764/359305/365207 μs
id: test-175031-307, time: 2.028s, sent: 7043 msg/s, received: 1317 msg/s, min/median/75th/95th/99th consumer latency: 499952/878971/1175757/1376007/1380011 μs
id: test-175031-307, time: 3.113s, sent: 1181 msg/s, received: 1264 msg/s, min/median/75th/95th/99th consumer latency: 1397108/1776888/2076450/2276947/2280045 μs
id: test-175031-307, time: 4.114s, sent: 1336 msg/s, received: 1308 msg/s, min/median/75th/95th/99th consumer latency: 2139700/2555824/2741999/2934838/2935774 μs
id: test-175031-307, time: 5.114s, sent: 1283 msg/s, received: 1507 msg/s, min/median/75th/95th/99th consumer latency: 3035346/3480274/3613550/3804079/3809466 μs
id: test-175031-307, time: 6.114s, sent: 2565 msg/s, received: 1503 msg/s, min/median/75th/95th/99th consumer latency: 3694689/4092083/4292815/4432469/4433152 μs
id: test-175031-307, time: 7.115s, sent: 0 msg/s, received: 1291 msg/s, min/median/75th/95th/99th consumer latency: 4522260/4949936/5150542/5344124/5346090 μs
id: test-175031-307, time: 8.115s, sent: 689 msg/s, received: 1206 msg/s, min/median/75th/95th/99th consumer latency: 5450762/5770413/5969120/6260902/6266098 μs
id: test-175031-307, time: 9.119s, sent: 861 msg/s, received: 537 msg/s, min/median/75th/95th/99th consumer latency: 6372845/6571241/6759990/7065929/7066133 μs
id: test-175031-307, time: 10.121s, sent: 862 msg/s, received: 511 msg/s, min/median/75th/95th/99th consumer latency: 6681663/6978422/7384080/7479825/7577313 μs
...
As can be seen, perf-test is able to produce and consume about 1,300 messages per second.
This can be seen in the rabbitmq management UI as shown below:
Installing kubectl rabbitmq plugin
The kubectl rabbitmq plugin provides commands for managing RabbitMQ cluster and can be installed by using?krew
kubectl krew install rabbitmq
You may see the below warning after installing rabbitmq plugin, proceed with your cluster testing and uninstall the plugin after the testing. Otherwise, you can stick to the manual testing above.
WARNING: You installed plugin "rabbitmq" from the krew-index plugin repository.
These plugins are not audited for security by the Krew maintainers.
Run them at your own risk.
References:
Platform Engineer at Civica
3 个月Don't suppose you've found a way to express the dependency on the operator since have you?
Experienced Senior Platform Engineer - SRE | Kubernetes ?? | Terraform ??? | Spark ? | Elasticsearch ??
1 年I did a very useful helm chart to deploy Cluster and provision Queues etc... https://github.com/datatok/helm-charts/tree/main/charts/rabbitmq
You are a big boy now, My boss. Love to see how you have grown ??
Observability Devops Engineer | Cloud Infrastructure Engineer | Platform Engineer | AWS
2 年Awesome ??
DevOps Enthusiast | Azure | GCP | Linux | Python
2 年Good one! Fully loaded. ??