High Availability Kubernetes Cluster with Ceph Storage Deployment
Reza Bojnordi
Site Reliability Engineer @ BCW Group | Solutions Architect Google Cloud and OpenStack and Ceph Storage
Introduction
This article covers the deployment of a high-availability Kubernetes cluster with load balancing, control plane redundancy, worker nodes, and Ceph storage integration. The setup ensures fault tolerance, scalability, and seamless data management.
Cluster Architecture
The architecture consists of:
Node Details
Virtual IP (VIP)
Deployment Steps
1. Setting Up Load Balancer Nodes
2. Deploying Kubernetes Cluster
3. Deploying Ceph Storage
Detailed Installation Steps
1. Configuring Load Balancer Nodes
Install Keepalived & HAProxy
apt update && apt install -y keepalived haproxy
Configure Keepalived
Create a health check script /etc/keepalived/check_apiserver.sh:
cat > /etc/keepalived/check_apiserver.sh <<EOF
#!/bin/sh
errorExit() {
echo "*** $@" 1>&2
exit 1
}
curl --silent --max-time 2 --insecure https://localhost:6443/ -o /dev/null || errorExit "Error GET https://localhost:6443/"
if ip addr | grep -q 172.16.16.100; then
curl --silent --max-time 2 --insecure https://172.16.16.100:6443/ -o /dev/null || errorExit "Error GET https://172.16.16.100:6443/"
fi
EOF
chmod +x /etc/keepalived/check_apiserver.sh
Create Keepalived configuration /etc/keepalived/keepalived.conf:
cat > /etc/keepalived/keepalived.conf <<EOF
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
timeout 10
fall 5
rise 2
weight -2
}
vrrp_instance VI_1 {
state BACKUP
interface eth1
virtual_router_id 1
priority 100
advert_int 5
authentication {
auth_type PASS
auth_pass mysecret
}
virtual_ipaddress {
172.16.16.100
}
track_script {
check_apiserver
}
}
EOF
Enable Keepalived:
systemctl enable --now keepalived
Configure HAProxy
Update /etc/haproxy/haproxy.cfg:
cat >> /etc/haproxy/haproxy.cfg <<EOF
frontend kubernetes-frontend
bind *:6443
mode tcp
option tcplog
default_backend kubernetes-backend
backend kubernetes-backend
option httpchk GET /healthz
http-check expect status 200
mode tcp
option ssl-hello-chk
balance roundrobin
server kmaster1 172.16.16.101:6443 check fall 3 rise 2
server kmaster2 172.16.16.102:6443 check fall 3 rise 2
server kmaster3 172.16.16.103:6443 check fall 3 rise 2
EOF
Enable and restart HAProxy:
systemctl enable haproxy && systemctl restart haproxy
Next Steps
Next Steps
sudo nano /etc/fstab
sudo swapoff -a
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
sudo tee /etc/sysctl.d/kubernetes.conf <<EOF
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
overlay
br_netfilter
EOF
sudo sysctl --system
sudo modprobe overlay
sudo modprobe br_netfilter
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/containerd.gpg
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt update
sudo apt install containerd.io -y
sudo containerd config default | sudo tee /etc/containerd/config.toml
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/k8s.gpg
echo "deb [signed-by=/etc/apt/keyrings/k8s.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update
sudo apt install kubelet kubeadm kubectl -y
sudo apt-mark hold kubelet kubeadm kubectl
==========================================================================
sudo kubeadm init --pod-network-cidr=192.168.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
import Kubernetes command
kubeadm token create --print-join-command
kubectl get nodes
kubectl get pods --all-namespaces --watch
kubectl cluster-info
kubectl get nodes
kubectl get nodes -o wide
kubectl get pods --namespace kube-system
kubectl get pods --namespace kube-system -o wide
kubectl get pods --all-namespaces --watch
kubectl get pods pod1 --output=yaml
kubectl create deployment nginx --image=nginx
kubectl get all --all-namespace | more
kubectl api-resouces | more
kubectl api-resource |grep pod
#explain
kubectl explain pod | more
kubectl explain pod.spec | more
kubectl explain pod.spec.containers | more
kubectl describe nodes worker1 | more
kubectl get -h | more
kubectl get -h | more
sudo apt install bash-completion
echo "source <(kubectl completion bash)" >> ~/.bashrc
source ~/.bashrc
kubectl g[tab]
Deploy Ceph Using Cephadm
root@ceph1:~# curl --silent --remote-name --location https://github.com/ceph/ceph/raw/pacific/src/cephadm/cephadm
root@ceph1:~# ls
cephadm
root@ceph1:~# chmod +x cephadm
Add repo
root@ceph1:~# ./cephadm add-repo --release pacific
root@ceph1:~# ./cephadm install
root@ceph1:~# which cephadm
/usr/sbin/cephadm
Bootstrap a new cluster
The first step in creating a new Ceph cluster is running the cephadm bootstrap command on the Ceph cluster’s first host. The act of running the cephadm bootstrap command on the Ceph cluster’s first host creates the Ceph cluster’s first “monitor daemon”, and that monitor daemon needs an IP address. You must pass the IP address of the Ceph cluster’s first host to the ceph bootstrap command, so you’ll need to know the IP address of that host.
End of following command it will spit out username/password of dashboard.
root@ceph1:~# cephadm bootstrap --mon-ip 172.16.16.210 --allow-fqdn-hostname
You can see, that your first node is ready.
root@ceph1:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f948bbd65858 quay.io/ceph/ceph-grafana:8.3.5 "/bin/sh -c 'grafana…" 6 minutes ago Up 6 minutes ceph-e0ed5b04-2d51-11ed-99fd-4124623d1806-grafana-ceph1
da9940de0261 quay.io/prometheus/alertmanager:v0.23.0 "/bin/alertmanager -…" 6 minutes ago Up 6 minutes ceph-e0ed5b04-2d51-11ed-99fd-4124623d1806-alertmanager-ceph1
5d64f507e598 quay.io/prometheus/prometheus:v2.33.4 "/bin/prometheus --c…" 6 minutes ago Up 6 minutes ceph-e0ed5b04-2d51-11ed-99fd-4124623d1806-prometheus-ceph1
189ba16afeef quay.io/prometheus/node-exporter:v1.3.1 "/bin/node_exporter …" 6 minutes ago Up 6 minutes ceph-e0ed5b04-2d51-11ed-99fd-4124623d1806-node-exporter-ceph1
6dc14e163713 quay.io/ceph/ceph "/usr/bin/ceph-crash…" 6 minutes ago Up 6 minutes ceph-e0ed5b04-2d51-11ed-99fd-4124623d1806-crash-ceph1
8f2887215bf4 quay.io/ceph/ceph "/usr/bin/ceph-mon -…" 6 minutes ago Up 6 minutes ceph-e0ed5b04-2d51-11ed-99fd-4124623d1806-mon-ceph1
d555bddb6bcc quay.io/ceph/ceph "/usr/bin/ceph-mgr -…" 6 minutes ago Up 6 minutes ceph-e0ed5b04-2d51-11ed-99fd-4124623d1806-mgr-ceph1-mmsoeo
ou can install the ceph-common package, which contains all of the ceph commands, including ceph, rbd, mount.ceph (for mounting CephFS file systems), etc.:
root@ceph1:~# cephadm install ceph-common
Now you can run ceph command native.
root@ceph1:~# ceph health
HEALTH_OK
Adding additional hosts to the cluster
NOTES: Before adding new node to cluster i would like to set unmanage on mon role. Otherwise cephadm will auto deploy mon on ceph2. (For quorom we just need single mon)
root@ceph1:~# ceph orch apply mon --unmanaged
To add each new host to the cluster, perform two steps:
root@ceph1:~# ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph2
root@ceph1:~# ceph orch host add ceph2 172.16.16.211 --labels _admin
Wait for sometime and then you will see two mgr node in cluster.
root@ceph1:~# ceph -s
cluster:
id: e0ed5b04-2d51-11ed-99fd-4124623d1806
health: HEALTH_WARN
OSD count 0 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum ceph1 (age 20m)
mgr: ceph1.mmsoeo(active, since 20m), standbys: ceph2.ewfwsb
Adding OSDs
This is lab so i am using loop device to create multiple OSDs to mimic fake disks. (Don’t use loop device in production)
Create LVM disk on both ceph1 and ceph2 nodes.
$ fallocate -l 200G 200GB-SSD-0.img
$ fallocate -l 200G 200GB-SSD-1.img
$ losetup -fP 200GB-SSD-0.img
$ losetup -fP 200GB-SSD-1.img
$ pvcreate /dev/loop0
$ pvcreate /dev/loop1
$ vgcreate ceph-ssd-vg /dev/loop0 /dev/loop1
$ lvcreate --size 199G --name ceph-ssd-lv-0 ceph-ssd-vg
$ lvcreate --size 199G --name ceph-ssd-lv-1 ceph-ssd-vg
We have two 199GB LVM disk per nodes so total 4 OSDs we can add in ceph.
root@ceph1:~# ceph orch daemon add osd ceph1:/dev/ceph-ssd-vg/ceph-ssd-lv-0
root@ceph1:~# ceph orch daemon add osd ceph1:/dev/ceph-ssd-vg/ceph-ssd-lv-1
root@ceph1:~# ceph orch daemon add osd ceph2:/dev/ceph-ssd-vg/ceph-ssd-lv-0
root@ceph1:~# ceph orch daemon add osd ceph2:/dev/ceph-ssd-vg/ceph-ssd-lv-1
Wait for sometime and then you can check status
root@ceph1:~# ceph osd stat
4 osds: 4 up (since 2h), 4 in (since 5h); epoch: e103
root@ceph1:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.77716 root default
-3 0.38858 host ceph1
0 ssd 0.19429 osd.0 up 1.00000 1.00000
1 ssd 0.19429 osd.1 up 1.00000 1.00000
-5 0.38858 host ceph2
2 ssd 0.19429 osd.2 up 1.00000 1.00000
3 ssd 0.19429 osd.3 up 1.00000 1.00000
Scale mon daemons
Currently we have only 2 node in ceph cluster that is why we have single mon node in cluster. I am going to add 3rd node in ceph cluster and add mon service on all 3 node for better redendency.
Current status of cluster
root@ceph1:~# ceph orch host ls
HOST ADDR LABELS STATUS
ceph1 172.16.16.210 _admin
ceph2 172.16.16.212 _admin
2 hosts in cluster
Add a new ceph3 node in the cluster
Pre-requisite to install docker-ce on newhost (ceph3). After that copy the ceph key to newhost.
root@ceph1:~# ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph3
Tell Ceph that the new node is part of the cluster:
root@ceph1:~# ceph orch host add ceph3 172.16.16.212 --labels _admin
A few minutes later
root@ceph1:~# ceph orch host ls
HOST ADDR LABELS STATUS
ceph1 172.16.16.210 _admin
ceph2 172.16.16.211 _admin
ceph3 172.16.16.212 _admin
3 hosts in cluster
Now lets tell cephadm to add 2 more mon daemon on ceph2/ceph3
root@ceph1:~# ceph orch daemon add mon ceph2:172.16.16.211
root@ceph1:~# ceph orch daemon add mon ceph3:172.16.16.212
Now, enable automatic placement of Daemons
root@ceph1:~# ceph orch apply mon --placement="ceph1,ceph2,ceph3" --dry-run
root@ceph1:~# ceph orch apply mon --placement="ceph1,ceph2,ceph3"
After few minutes you will see we have 3 mon daemon
root@ceph1:~# ceph mon stat
e12: 3 mons at {ceph1=[v2:172.16.16.210:3300/0,v1:10.73.0.191:6789/0],ceph2=[v2:10.73.0.192:3300/0,v1:172.16.16.211:6789/0],ceph3=[v2:10.73.0.193:3300/0,v1:172.16.16.212:6789/0]}, election epoch 48, leader 0 ceph1, quorum 0,1,2 ceph1,ceph3,ceph2
In more details
root@ceph1:~# ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
alertmanager.ceph1 ceph1 *:9093,9094 running (13d) 5m ago 2w 50.9M - ba2b418f427c da9940de0261
crash.ceph1 ceph1 running (13d) 5m ago 2w 8584k - 17.2.3 0912465dcea5 6dc14e163713
crash.ceph2 ceph2 running (13d) 110s ago 2w 7568k - 17.2.3 0912465dcea5 046663e7d825
crash.ceph3 ceph3 running (28m) 44s ago 28m 9092k - 17.2.3 0912465dcea5 068bcd9a1b0c
grafana.ceph1 ceph1 *:3000 running (13d) 5m ago 2w 130M - 8.3.5 dad864ee21e9 f948bbd65858
mgr.ceph1.mmsoeo ceph1 *:8443,9283 running (13d) 5m ago 2w 688M - 17.2.3 0912465dcea5 d555bddb6bcc
mgr.ceph2.ewfwsb ceph2 *:8443,9283 running (13d) 110s ago 2w 436M - 17.2.3 0912465dcea5 17c25c7ac9d7
mon.ceph1 ceph1 running (13d) 5m ago 2w 464M 2048M 17.2.3 0912465dcea5 8f2887215bf4
mon.ceph2 ceph2 running (12m) 110s ago 12m 55.1M 2048M 17.2.3 0912465dcea5 f93695536d9e
mon.ceph3 ceph3 running (21m) 44s ago 21m 84.8M 2048M 17.2.3 0912465dcea5 2532ddaed999
node-exporter.ceph1 ceph1 *:9100 running (13d) 5m ago 2w 67.7M - 1dbe0e931976 189ba16afeef
node-exporter.ceph2 ceph2 *:9100 running (13d) 110s ago 2w 67.5M - 1dbe0e931976 f87b1ec6f349
node-exporter.ceph3 ceph3 *:9100 running (28m) 44s ago 28m 46.3M - 1dbe0e931976 283c6d21ea9c
osd.0 ceph1 running (13d) 5m ago 2w 1390M 17.2G 17.2.3 0912465dcea5 3456c126e322
osd.1 ceph1 running (13d) 5m ago 2w 1373M 17.2G 17.2.3 0912465dcea5 7c1fa2662443
osd.2 ceph2 running (13d) 110s ago 2w 1534M 18.7G 17.2.3 0912465dcea5 336e424bbbb2
osd.3 ceph2 running (13d) 110s ago 2w 1506M 18.7G 17.2.3 0912465dcea5 874a811e1f3b
prometheus.ceph1 ceph1 *:9095 running (28m) 5m ago 2w 122M - 514e6a882f6e 93972e9bdfa9
Ceph Maintenance Options
To perform any kind of maintenance on OSD nodes you can use following flags
ceph osd set noout
ceph osd set norebalance
ceph osd set norecover
To remove maintenance mode
ceph osd unset noout
ceph osd unset norebalance
ceph osd unset norecover
In next post I will cover how to integrate ceph with kolla-ansible deployment. Enjoy!