An Intolerable Misconfiguration Experience with the Cloud: Learnings from the UniSuper-Google Cloud Incident

An Intolerable Misconfiguration Experience with the Cloud: Learnings from the UniSuper-Google Cloud Incident

The failure of UniSuper’s online app recently highlighted not only the problems with the competence of the company’s top brass but the entire financial services industry too. Even during a week, more than 100,000 users did not have access to their accounts as part of the Google Cloud private network was mistakenly removed.

This event is a rather noticeable importance of using the Infrastructure-as-Code (IaC) method can be described as a codification of cloud infrastructure.

The UniSuper-Google Cloud incident boiled down to a confluence of missteps:

  • Rare Google Cloud Issue: The source of the issue appears to be linked to the Google Cloud's side.
  • Misconfiguration During Provisioning: The setup process of UniSuper's private clouds ended with an unintentional leap of the configuration which was not corrected in time.
  • Unknown Software Bug Triggered: This configuration error however exposed a software bug in Google that occurred during the up-take of the banks' IT system.
  • UniSuper's System Impacted: Out of all the systems, this failure only targeted one - UniSuper, thus even the collected data was deleted.

Important lessons can be learned from the UniSuper-Google Cloud event by both cloud providers and users:

  • Multi-layered Redundancy:

Let us make sure you have backups with multiple cloud providers or store them on-premise. Each computer in the area shall have travel plans and will be equipped with geographical disperses as a backup for protection.

  • Automated Testing:

Use testing automation in the process of cloud provisioning to prevent mistakes made during it.

  • Staff Training:

Elevate the IT crew from your cloud platform to decrease configuration problems.

  • More robust Infrastructure as Code (IaC):

Design a plan and establish secondary alternatives alongside applying IaC ventures like Terraform as a measure of precaution.

  • Tighter Change Control:

Decide on a mechanism that grants full investigation rights and approval for every minor modification made to the main cloud setup.

  • Incident Response Plan:

Prepare your customers by having clear lines of communication available for them to use, procedures for dealing with outages in place, and roles for the team clearly defined.

  • Transparency and Communication:

Announce scheduled and unscheduled downtimes to the users with specific information on times of the outage.


#Cloud #Disasterrecovery #infraprovision #automatedtesting #infrastructureascode


Steffen Müller

Enterprise Security Architect, vTISO/vCISO - Effectively prevent IT failures, security breaches and data theft with insight.

9 个月

The capability of UniSuper to finally recover shows, that at least some architectural decisions were made correctly by having an independent backup strategy that seemed to have worked as desired. Georedundancy btw. is absolutely no replacement for having a well designed and tested DR regimen.

要查看或添加评论,请登录

Murali Dulam的更多文章

社区洞察

其他会员也浏览了