An Intolerable Misconfiguration Experience with the Cloud: Learnings from the UniSuper-Google Cloud Incident
Murali Dulam
Founder @ ITasCode Pvt Ltd | Solution Architecture | DevOps, Kubernetes, Cloud Consultant
The failure of UniSuper’s online app recently highlighted not only the problems with the competence of the company’s top brass but the entire financial services industry too. Even during a week, more than 100,000 users did not have access to their accounts as part of the Google Cloud private network was mistakenly removed.
This event is a rather noticeable importance of using the Infrastructure-as-Code (IaC) method can be described as a codification of cloud infrastructure.
The UniSuper-Google Cloud incident boiled down to a confluence of missteps:
Important lessons can be learned from the UniSuper-Google Cloud event by both cloud providers and users:
Let us make sure you have backups with multiple cloud providers or store them on-premise. Each computer in the area shall have travel plans and will be equipped with geographical disperses as a backup for protection.
Use testing automation in the process of cloud provisioning to prevent mistakes made during it.
Elevate the IT crew from your cloud platform to decrease configuration problems.
领英推荐
Design a plan and establish secondary alternatives alongside applying IaC ventures like Terraform as a measure of precaution.
Decide on a mechanism that grants full investigation rights and approval for every minor modification made to the main cloud setup.
Prepare your customers by having clear lines of communication available for them to use, procedures for dealing with outages in place, and roles for the team clearly defined.
Announce scheduled and unscheduled downtimes to the users with specific information on times of the outage.
#Cloud #Disasterrecovery #infraprovision #automatedtesting #infrastructureascode
Enterprise Security Architect, vTISO/vCISO - Effectively prevent IT failures, security breaches and data theft with insight.
9 个月The capability of UniSuper to finally recover shows, that at least some architectural decisions were made correctly by having an independent backup strategy that seemed to have worked as desired. Georedundancy btw. is absolutely no replacement for having a well designed and tested DR regimen.