Kubernetes as the Common Substrate
In the previous article, we discussed how outages are becoming more frequent and disrupting our daily lives. A service outage is a matter of when, not if, regardless of whether it is deployed on-premises or on the public cloud. We looked at how Kubernetes can be a common substrate to promote a standardized approach and a unified platform across different environments, to reduce the complexity of keeping our services highly resilient.
In this article, we will dive deeper into Kubernetes and how it helps to achieve workload resiliency on private or hybrid cloud. We will touch on the following topics:
What is Kubernetes-Native Workload
The CNCF Cloud Native definition is
Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.
Kubernetes is an open-source container orchestrator. It automates the lifecycle of containerized applications on modern infrastructures, functioning as a “data centre operating system” that manages applications across a distributed system. Kubernetes provides the automation and observability needed to efficiently manage multiple applications at the same time. The declarative, API-driven infrastructure of Kubernetes allows cloud-native development teams to operate independently and increase their productivity.
Kubernetes-native is a specialization of cloud-native, specifically tailored for Kubernetes with the following advantages:
Design Patterns for Application Resiliency
There are different patterns for stateful application availability on Kubernetes. It is more complex and costly to have an application run active/active across multiple Kubernetes clusters, however, they are highly resilient. Some applications can only run active/passive for various reasons, such as a single master/writer or storage limitations.
The red dots in the image represent various aspects related to different application recovery methods:
Depending on the SLO, here are some methods to meet the goals:
1. Backup & restore: This is a snapshot in time and there will be data loss hence the RPO will never be zero, however, it is the simplest to implement.
2. Automated cluster rebuild with GitOps: It is preferred to have automation to perform cluster creation and configuration management. It doesn’t matter if you are setting up the first cluster, Nth cluster or recovering a cluster, you should have automation in place. The application can then be restored by GitOps assuming the data is safe outside the cluster.
3. Active/Passive: Many traditional applications aka not cloud-native on Kubernetes cannot active full active-active. They are dependent on their storage provider and orchestration tools for failover or failback. There is complexity because of the orchestration and there may not be zero RPO if it’s not a synchronised data replication.
4. Active/Active: This is a highly resilient deployment where the workload is deployed across active-active clusters, ideally spread across multiple AZs or cloud or data centers providers. In such a design, there is no single master or writer concept, and they can all be active at the same time. This allows the workload to scale horizontally and deploy on another Kubernetes cluster easily. However, there are factors like latency, network throughput or even data sovereignty that can affect this design.
It is important to consider the data resilience pattern too. Storage providers integration with Kubernetes has vastly improved over the years, and I see more and more storage providers can integrate from on-premises to the cloud to provide you with a seamless experience.
领英推荐
Demo with Red Hat OpenShift Container Platform
OpenShift is an enterprise application platform that provides a common substrate for organizations to run on any certified infrastructure platform of choice today. It is not only based on Kubernetes but has the necessary tooling to help achieve application resilience on-premises or on the cloud, such as:
In this demonstration, we aim to show case active-active workloads across multiple clusters to handle failover and workload relocation capabilities. Cool store is a hypothetical shopping website that is built using microservices and various cloud-native technologies with 3 active clusters that is running with active-active workloads.
It is expected that the cool store can handle the following scenarios:
Here are the components used:
While we aim to demonstrate an active-active deployment to ensure a highly resilient application, it is important to note that it is also possible with an Active/Passive setup with storage orchestration on-premises or on the public cloud with OpenShift.
The Git repository is here: https://github.com/rhsgsa/hybrid-cloud-coolstore
Demo video is coming soon!
Lastly, head over to Red Hat Validated Patterns for the GitOps pattern or other related patterns that you can apply on your environment.
Conclusion
Kubernetes provides inherent resilience for containers within the cluster. When a container (or Pod) fails, Kubernetes automatically restarts it. However, container resilience alone is not sufficient. What if the entire cluster becomes unavailable? What if the cluster runs out of capacity? To address this, you will be required to have multiple clusters for high availability. If one cluster fails, your solution can still run in another cluster. Ideally, have the clusters deployed across different data centers or regions to enhance resilience. Complex topologies spanning multiple data centers or cloud providers can lead to availability issues. Multiple smaller clusters are easier to manage, scale, and upgrade.
In summary, OpenShift application platform allows organizations to build and simplify their application deployment using cloud-native concepts and technologies. OpenShift’s flexibility, combined with the right architecture and deployment practices, ensures resilient and efficient application management.
Credits
I had the opportunity to collaborate with a great team of Solutions Architect ( Kin Wai Koo Kah Hoe Lai Darrick Leong Juin Hau Ong Juncheng Anthony Lin Stephen Bylo Prempal Singh ) to conceptualize and create the demo after multiple sprints, whiteboarding and late nights.
My appreciation goes to F5 Joko Yuliantoro and Yugabyte Yogi Rampuria for their assistance in incorporating their services with OpenShift.
Finally, to our management team Guna Chellappan Eric Vong Kelvin Loh for their input and suggestions.
--
7 个月To learn more about how Kubernetes enhances workload resiliency in public, private, and hybrid clouds, you can check out the article on prodevtivity.com. The article explores the importance of Kubernetes as a common substrate for standardized, resilient operations, and delves into topics such as embracing Kubernetes-native practices, architectural patterns for robust applications, data resilience strategies within Kubernetes, and the strengths of Red Hat OpenShift. Keep an eye out as a demo video is also coming soon!
Great writeup Liming once again with your follow up posting on k8s and ecosystem to help improve the resiliency. F5 is proud to partner with Red Hat to make OCP multi-cluster deployment easy, highly resilient and App/API secure. Here is another good Linkedin posting by my colleague?Foo-Bang Chan?that speaks to the OCP in multi-cluster and multi-cloud environment -?https://www.dhirubhai.net/feed/update/urn:li:activity:7081771626584805376/
Cloud and OSS enthusiast | AWS SA Professional | RHCA X | Red Hat Certified OpenShift Architect |Azure/Oracle/Nutanix Certified
7 个月Great article boss!