Persistent Storage in AWS Cloud-Native Microservices with AWS FSX Lustre
Guillermo Wrba
Autor de "Designing and Building Solid Microservice Ecosystems", Consultor Independiente y arquitecto de soluciones ,evangelizador de nuevas tecnologias, computacion distribuida y microservicios.
One characteristic of any modular microservice development is that , microservices are stateless by nature, which means no particular session data , user data or context data should be kept stored across different microservice invocations and this is to avoid any tight-coupling between the microservice implementation and the microservice consumer, or between the microservice implemention, and the microservice backend.
In some particular data processing scenarios - especially those related with the batch-oriented or real-time oriented data processing pipelines with a datawharehouse component - , data must be ingested and stored somewhere before an actual REST-API exposed microservice can be called to pull such data in various different ways in order to pull all the required information that may satisfy the different data-consumption use cases , usually following a "shared database" microservice dessign pattern, for which sharing state across microservices is a must-to-have, and typically found on data-warehouse oriented solutions
Large data processing pipelines with data warehouse may not only rely on a central distributed database, but also usually on a big distributed filesystem to store the medium-to-big data chunks and made it available across the different data consumers, both on the data ingestion, but also on the data query side. If microservices have been implemented in order to consume data - in the form of data wrappers - then such microservices should also be able to see the same data and consume it as required. Containers can solve this problem by introducing the concept of persistent volumes.
A real-world use case implementation
There could be several examples showcasing how we can leverage the persistent volume (PV) and persistent volume claim (PVC) features that have been shipped as part of kubernetes, which can consistently provide a centralized storage for microservices.
For the sake of clarity, here i'm going to discuss just a simple use case i had to work with when trying to re-architect a monolithic data pipeline for metrics data ingestion, including telemetry data coming from multiple remote devices in the form of sensors and other sensing-capable devices into a full fledged microservice solution.
The Below diagram highlights the high-level architecture of the proposed solution, that uses a series of data-wrapper APIs to extract/analyze the data via the data access layers implemented on top of REST-API, gRPC and GraphQL. Data ingested through the pipeline gets uploaded, processed and then transformed into a secondary parquet file format files, and stored into an AWS FSX filesystem.
By leveraging K8S PVC and PV, the AWS FSX filesystem was first, presented to the kubernetes clusters, and then, mounted as part of the deployment as part of each individual microservice by leveraging in this example, the AWS FSX Lustre distributed filesystem. There are multiple different flavors when it comes to a distributed filesystem, however in this particular scenario, we have chosen Lustre because of multiple reasons, but the main reason is that it has proven to be enough robust and solid in terms of performance, availablity, scalability and security, without compromising the cost/benefit relationship. At this point, i consider it is important to highlight the difference also between a REAL distributed filesystem and a simple shared filesystem; a real distributed filesystem distributes not only the load but also the storage across individual chunks of data across multiple cluster individual nodes, so that it maximizes the availability in terms of both service and storage; those characteristics are not necessarily being brought by a shared filesystem, which is intended to just share a disk across N consumers, most of the times being centrally served by a single storage appliance. The AWS Lustre FSX solution provided by AWS has proven to fulfill the expectations in terms of performance and running-costs for a medium-sized data pipeline as the one we were dealing with here, so after evaluating multiple alternatives, we finally decided to go for it and try it.
Distributed Filesystem Interfacing
There are several steps associated with provisioning and presenting -- "interfacing" -- the distributed filesystem to the EKS cluster ,so that it becomes visible and hence mounted from within the individual microservice's containers so that a persistent shared state storage across individual instances is available, including:
Provisioning and Presenting a real file system
In this section, i'm trying to summarize the exact steps required - from the infrastructure PoV - in order to create the above objects , so that we can finally can have the filesystem up and running, so that our common data store can be visible from the micro service side.
Of course, Depending on the distributed filesystem type, the cloud provider of choice and the particular version of Kubernetes ( EKS, AKS, GKE ) in use , such steps may changing radically, however the basic concepts of presenting and interfacing the filesystem are similar. I'm not going here to cover all the details since it's not the intention , so if you are interested in specific details, do not hesitate to reach me out directly :)
In any case, the recommendation is to rely on IaC such as Terraform/Ansible when it comes to create sustainable scripts that can create the whole infrastructure from scratch, and avoid at any cost performing manual - and repetitive - actions, considering especially a multi-cloud approach that may potentially be spanning multiple cloud providers.
The below instructions are given as directions assuming your EKS cluster is running on top of Amazon Linux machine. If you're using a different operating system, then this instructions may apply partially or do not apply at all. In order to be able to run the steps below, i recommend first of all to set up a bastion server with at least, the below preconditions being met:
Here, specific values must be provided for each of the fields. This is just a sample so you can customize it as required. The size of this filesystem is 2,4Tb by default with a default 500MB/sec throughput , and we are setting a storage type of SSD ( solid state device) . The file system version is set to 2.12 ( the latest). SubnetID must be set to one available subnet within the VPC on top of which the filesystem will get deployed. The security group ID must bet set to an ID of a security policy (sg-fsx-fs) that must have been already created. ( we will cover it on next step). Note that we are capturing the DnsName, MountName and FSID from AWS CLI output. We will need those for further steps.
aws fsx create-file-system
--file-system-type LUSTRE \
--storage-capacity 2400 \
--storage-type SSD \
--file-system-type-version 2.12 \
--subnet-ids subnet-fsx-fs? \
--security-group-ids sg-fsx-fs \
--lustre-configuration "DeploymentType=PERSISTENT_2,PerUnitStorageThroughput=500,DataCompressio
nType=NONE,LogConfiguration={Level=WARN_ERROR}" \
--profile adminaccess \
--region us-west-2 \
--tags Key=Name,Value=gl-community-srv_data-fs? 2>&1 >/tmp/fs.spec
export DNSNAME=$(grep DNSName /tmp/fs.spec | awk ' { print $2 } ' | cut -f2 -d\" )
echo $DNSNAME
export MOUNTNAME=$(grep -i mountname /tmp/fs.spec | awk ' { print $2 } ' | cut -f2 -d\" )
echo $MOUNTNAME
export FSID=$(grep -i filesystemid /tmp/fs.spec | awk ' { print $2 } ' | cut -f2 -d\" )
echo $FSID
2. Create the default FSX Lustre AWS's Security Policy [ sg-fsx-fs ]
As documented on the AWS Lustre official documentation for Linux, a security policy must be specifically created for this filesystem to operate properly. If this security policy is not initialized as specified here, you'll encounter issues related to traffic being blocked when trying to mount or access it. This SG ( named SG-FSX-FS ) must enable the inbound traffic as per the below port specs.
3. Create a K8S StorageClass.
To create it, you must run the below command, replacing the subnetId and the securityGroupID with the correponding ID values.
cat <<EOD
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
? name: fsx-sc
provisioner: fsx.csi.aws.com
parameters:
? subnetId: subnet-fsx-fs
? securityGroupIds: sg-fsx-fs
EOD | kubectl apply -f -
4. Deploy the PersistentVolume (PV) and the PersistentVolumeClaim (PVC), by first creating a file named "claim-persistent.yaml" with the below content
apiVersion: v1
kind: PersistentVolume
metadata:
? name: voltaiq-fsx-volume
? namespace: voltaiq-vce-api
spec:
? capacity:
??? storage: 2400Gi
? volumeMode: Filesystem
? accessModes:
??? - ReadWriteMany
? mountOptions:
??? - flock
? persistentVolumeReclaimPolicy: Recycle
? csi:
??? driver: fsx.csi.aws.com
??? volumeHandle: ${FSID}
??? volumeAttributes:
????? dnsname: ${DNSNAME}
????? mountname: ${MOUNTNAME}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
? name: fsx-claim
? namespace: voltaiq-vce-app
spec:
? accessModes:
??? - ReadWriteMany
? storageClassName: ""
? resources:
??? requests:
????? storage: 20G
? volumeName: voltaiq-fsx-volume
Then, proceed to run the below command
领英推荐
cat $(dirname $0)/claim-persistent.yaml | envsubst | kubectl delete -f
cat $(dirname $0)/claim-persistent.yaml | envsubst | kubectl apply -f --
The above will proceed to first , delete the claim persistent set [ if it already exists ] , and then, create it again.
5. Enabling service-linked IAM roles
A service-linked role is a unique type of IAM role that is linked directly to Amazon FSx. Service-linked roles are predefined by Amazon FSx and include all the permissions that the service requires to call other AWS services on your behalf. You must first define a policy.json file and paste the content as below
{?
"Version": "2012-10-17",
? "Statement": [
??? {
????? "Effect": "Allow",
????? "Action": [
??????? "iam:CreateServiceLinkedRole",
??????? "iam:AttachRolePolicy",
??????? "iam:PutRolePolicy"
?????? ],
????? "Resource": "arn:aws:iam::*:role/aws-service-role/fsx.amazonaws.com/*"
??? },
??? {
????? "Effect": "Allow",
????? "Action": [
??????? "fsx:*"
????? ],
????? "Resource": ["*"]
? }]
}
Then, you should be able to apply the policy by executing the below command
aws iam create-policy --policy-name fsx-iam-policy --policy-document ./policy.json --description "FSX Lustre IAM Policy"
6. Validating your PV and PVC
At this point, the PV and PVC should have both been provisioned and available. In order to validate if the creation was successfull or not, you should run the below command
kubectl get pvc --all-namespaces
If everything went ok, you should expect an output like below.
NAMESPACE???????? NAME??????? STATUS?? VOLUME?????????????? CAPACITY?? ACCESS MODES?? STORAGECLASS?? AG
voltaiq-vce-api?? fsx-claim?? Bound??? voltaiq-fsx-volume?? 1200Gi???? RWX?????????????????????????? 34dE
Note that the "STATUS" column should read as "Bound", which means that the physical layer has been bound successfully and associated logically with the newly provided Persistent Volume Claim. If for any reason you get a different value, then i would recommend to double check the above steps.
You should also check the persistent volume, by issuing the bleow
kubectl get pv --all-namespaces
NAME???????????????? CAPACITY?? ACCESS MODES?? RECLAIM POLICY?? STATUS?? CLAIM?????????????????????? STORAGECLASS?? REASON?? AGE
voltaiq-fsx-volume?? 1200Gi???? RWX??????????? Recycle????????? Bound??? voltaiq-vce-api/fsx-claim?????????????????????????? 34d
Again, the STATUS column with a "bound" value represents success. Anything different from that means that the provisioning failed for some reason.
Presenting the filesystem to a Microservice
Wow.. it has been a lot so far, however we're not finished but pretty closer. Once we confirm both PV/PVC have been presented in the right way to the EKS cluster, we can now move to the final and crucial step of getting the filesystem mounted on our microservices, so that the data can be made visible for read-write operations in a distributed and synchronoud way across multiple instances of the same. Presenting a file system to a microservice is something that needs to be specified in our microservice deployment file.
When we deploy a microservice, we require to provision at least four different objects so that K8S understands completely how this service will be deployed, which image to pick, how to get it scaled and how to handle incoming traffic, and all that must be specified in the form of a deployment, an ingress, a service, and an HPA configuration section. In order to mount an existing volume, we need to add specific metadata inside the deployment section, through the "volumes" and "volumeMounts" annotations, as shown in the below example:
apiVersion: apps/v1
kind: Deployment
metadata:
? name: public-api-deployment
? namespace: voltaiq-vce-api
spec:
? replicas: 2
? selector:
??? matchLabels:
????? app: public-api-app
? template:
??? metadata:
????? labels:
??????? app: public-api-app
??? spec:
????? topologySpreadConstraints:
????? - maxSkew: 1
??????? topologyKey: kubernetes.io/hostname
??????? whenUnsatisfiable: ScheduleAnyway
??????? labelSelector:
????????? matchLabels:
??????????? app: public-api-app
????? volumes:
????? - name: voltaiq-data-fs
??????? persistentVolumeClaim:
????????? claimName: fsx-claim
????? - name: public-regional-settings
??????? configMap:
????????? name: public-regional-settings
????? containers:
????? - image: 568405397948.dkr.ecr.us-west-2.amazonaws.com/volta-vce:public-api-v4
??????? imagePullPolicy: IfNotPresent
??????? name: public-api-image
??????? resources:
???????? requests:
?????????? memory: "2048Mi"
?????????? cpu: "1024m"
???????? limits:
?????????? memory: "2048Mi"
?????????? cpu: "1024m"
??????? ports:
??????? - containerPort: 8000
??????? volumeMounts:
??????? - name: voltaiq-data-fs
????????? mountPath: /srv/data
??????? - name: public-regional-settings
????????? mountPath: /app/global_config/regional_settings.py
????????? subPath: regional_settings.public.py
??????? env:
??????? - name: LOCAL_DEV
????????? value: "__true__"
Note that the below section, is simply associating the already existing PersistentVolumeClaim (PVC) that has been created under the name "fsx-claim", with the logical name "voltaiq-data-fs"
volumes:
????? - name: voltaiq-data-fs
??????? persistentVolumeClaim:
????????? claimName: fsx-claim:
Later , we then link the same logical name to indicate that we want to mount the logical volume into the /srv/data filesystem:
volumeMounts:
??????? - name: voltaiq-data-fs
????????? mountPath: /srv/data
In this example, of course, i decided to exclude the ingress,HPA and service definitions in order to avoid a too length article, but the idea is, once you have your deployment object ( as discussed above), your HPA your service and the ingress ready, then you should be able to deploy the microservice by performing the usual command
kubectl apply -f my_microservice.yaml
And hence, get everything created from scratch. Once the deployment gets instantiated, it will automatically try to mount the Lustre FSX filesystem into the containers. You can validate that the filesystem has been mounted in the right way by just issuing a command similar to below
docker exec -it <ID of your container> ls -lt /srv/data
If everything went ok, your file listing will list the files being shared on the distributed filesystem from within your container, as shown below. Note that when issuing the "Df -k" command, the command is returning something simnilar to "172.31.26.39@tcp:/khc6fbm", which represents a network mount point. This is telling us that the filesystem has been properly mounted on the container, and available.
Stay Tuned! On my next compilation, i'll try to present some interesting material on GRPC and API Ingress Controllers for Kubernetes and how to properly route grpc traffic for GRPC-exposed microservices.