ROSA,AWS,EBS and why would you do that?
This artice starts with Rhys Powell saying that you really shouldn't do this but, in an unusual twist of situations Philipp Bergsmann says why not!?!?! More on that later...
Our usual role as Black belts is to work with customers to help them understand things, to help them position managed openshift or to get technical difficulities removed. Ocassionally we like to work with some of our hyperscaler partners to get hands on with tech or to get a refresher on things, as we all know the tech world is always changing so constant learning is a good thing.
A recent occurance of this was when the EMEA Black Belt team had a "state of storage review" with some of our friends at AWS. We were very lucky to have Tom Tasker and Rod Wilson come in and give a us a talk. Storage is a big thing in AWS and both of them were able to dig into a lot of topics, espically things our customers often ask, meaning we are better informed to help them all moving forward. They truly did an excellent job of presenting so much information but one part of EBS storage brought around some conversation, an idea, a test and then this blog post.
It's often forgotten that you can change your parts of your EBS volume, things such as the type and the IOPS. This is something that gets considered more often with EC2 instances, getting the right performance vs cost for the work load. Yet in our Openshift world we forget this as storage is abstracted away. Yes we define our storage classes, yes they do hook into the storage in the cloud and then they are forgotten about.
It was at this point that discussion commenced. Points were raised that this wouldn't work or it would break things, with counterpoints of it wont do a thing as the cluster won't know and won't care as its abstracted away.
Would changing the Volume type on the AWS side cause any issues to the persistent volume provisioned through the cluster?
We start off with a simple cluster. In this instance it was a ROSA HCP cluster running in eu-west-2, the speed of standing up and London is the nearest region. Nothing else was changed, the default, out of the box, storage classes were kept gp3-csi (default) and gp2-csi.
CSI or container storage interface is the abstraction that allows k8s to use the available block and file storage systems that exist in the underlying infrastructure, in this case we were using the AWS EBS CSI Driver, which allows for configuring the EBS volume as its requested. Those configurations are, generally, created in the storage class. In normal running a Persistent Volume Claim (PVC) is made by an application, if there is a Persistent Volume (PV) that matches via name, it will get reused, if not, the PV gets created, using the storage class setting as supplied.
We have our cluster, we know how the CSI works, we now need some code.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: rhys-claim
namespace: volume-fun
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: gp2-csi
volumeMode: Filesystem
The project (namespace) called volume-fun has already been created, as mentioned earlier, the gp2-csi class is something that comes with ROSA out of the box. We can call for the creation of the pvc
oc create -f test_pvc.yaml
The claim has been created and is now in pending state, waiting for the first consumer of it to be created.
Right now the cluster is aware that it will need a volume, should a container request it, but it won't creat that until that first request is made. We now need a container that will make the request. For this test we also need to be doing something on that mounted claim, so we can confirm it is working and being used and then see what happens if we do change things.
Not wanting to make things too complex, using dd and writing a file of random data of a decent size, was the decision. The container will mount the volume and then we test the IOPs, through the writing of this file. Changing IOPS was chosen as this is the easiest and most significant change between the differing storage options. To see whats happening we can leverage cloudwatch metrics against the volume.
The Dockerfile code
FROM registry.access.redhat.com/ubi9-minimal:9.4-949.1717074713
RUN mkdir /data1
CMD ["dd", "if=/dev/urandom", "of=/data1/file.out", "bs=512M", "count=100"]
Build the container and push it to a container registry
podman build -t quay.io/rhpowell_mobb/volume_fun .
podman push quay.io/rhpowell_mobb/volume_fun
Deploy the container to the cluster, this will then cause the the claim to kick off the provisioning process.
领英推荐
apiVersion: v1
kind: Pod
metadata:
name: iowrite
labels:
app: iowrite
namespace: volume-fun
spec:
containers:
- name: iowrite
image: "quay.io/rhpowell_mobb/volume_fun:latest"
volumeMounts:
- mountPath: /data1
name: data1
volumes:
- name: data1
persistentVolumeClaim:
claimName: rhys-claim
As you can see the code creates a pod, the pod has the container called iowrite which was just built and pushed to Quay. This container has a vloume mount called data1, this matches the folder that was part of the command in the container. That volume mount is attached to the volume that matched the PVC clamin name.
Deploy the container
oc apply -f pv-pod.yaml
This will give us a warning as, for clarity, we have been a little lazy and not set a number of security settings but the container will run.
We also now see that the volume has been created and has been created from the rhys-claim
We now need to look for this volume in the cloud watch metrics. We can then take the read ops and the write ops, reduce the period to 1 second. Add a new maths field that we add m1 and m2 that will give us the total IOPs. Not strictly necessary as we only want to confirm changes and see if things go wrong and the app is all write not really read but its good to have everything covered just in case. As our job runs only writing 1/2G blocks 100 times it will stop the container once its completed. The cluster will automatically restart the pod once it dies. We should see a flat line, with a dip everytime the process completes, the container dies and then gets restarted.
We now need to play with the volume! This is as simple as selecting the volume inthe AWS console and then hitting modify. The choice here was to the extreme, from gp2 running at 300 IOPS we are making the leap to IO2 and throwing 30000 IOPS at it. After changing the settings, we just hit modify, not restarting, no disconnecting, nothing other than modify and AWS just takes care of it all in the back ground with no interruption to the storage as its running.
We watch as it goes through the modifying process until that hits 100%, we then take a look at the metrics and see that the IOPS has instantly jumped up and that the task is taking far less time to complete than it did previously.
So all is good... Just to make sure it wasn't a fluke, the logs files of the container and the CSI are checked and no errors are shown, the pod is even deleted and recreated and nothing was shown to cause an issue.
What did we prove?
The CSI driver doesn't look at what the actual volume type is after creation. Bewarned, this has only been tested on AWS.
Is this good, well as Philipp points out and his argument for giving it a try is that it is an opportunity to test theoris around causes of bettle necks or that even gp3 might be and over provision and you can drop down to good ols magnetic storage. If you have huge vloumes this could considerable optimise your costs when its covered across a number of environments. While Rhys say no you shouldn't he really understands where Philipp is coming from as it is a great opportunity to test but it goes against one of the very key principles that has been pushed for years of no manual changes, so do it but make sure you plan to correct the changes needed right away.
Conclusions
The abstraction works, the ability to change disk types could be useful. We will leave it up to you to decide if you should or should od it. Finally, all learning is fun as it helps with deeper understanding and if this has even slightly sparked an interest in running managed openshift, reach out to Philipp Bergsmann or Rhys Powell as they do mostly do serious stuff!
If you want the code to play with this yourself it can be found here
Thank you for the collaboration!