EZJILI,Jli walang deposito free spins.Recharge Every day and Get Bonus up-to 50%!

Ahhh regression testing, a by-product of software development and often thought about but never implemented! What if I told you, it should be one of the first few tests you implement as a Cloud Engineer writing Terraform?

I'm not a fan of the "applies in every case", articles so lets break it down into some of the benefits.

Terraform and IaC challenges

Circular dependencies

If like me you use Terraform, you'll know it's pretty good at sorting out what needs to go first in order to successfully apply your Infrastructure As Code (IaC), for example;

data "aws_iam_policy_document" "assume_role_policy_taskrole" {
? statement {
? ? sid ? ?= ""
? ? effect = "Allow"


? ? actions = [
? ? ? "sts:AssumeRole",
? ? ]


? ? principals {
? ? ? type ? ? ? ?= "Service"
? ? ? identifiers = ["ecs.amazonaws.com", "ecs-tasks.amazonaws.com"]
? ? }
? }
}

resource "aws_iam_role" "task_role" {
? name ? ? ? ? ? ? ? = "task-role"
? description ? ? ? ?= "The role the container running on ECS will use"
? assume_role_policy = data.aws_iam_policy_document.assume_role_policy_taskrole.json


? tags = merge(
? ? var.common_tags,
? ? {
? ? ? "Name" = task-role")
? ? },
? )
}

Now Terraform will know it needs to process the data source before the aws_iam_role, its clever like that! As we add more and more resources we tend to build these things like Tetris, write, plan, apply, repeat (I should put that on a t-shirt), even within resources we add further options, references to other resources which have already been applied, for example;

Here we have a DynamoDB backend for an ECS Service. Typically an Engineer might deploy ECS and DynamoDB as resources first and then add the IAM roles that both resources will use to talk to each other.

In the above example since the DynamoDB Table and the ECS service were created first as part of the development process Terraform will more than happily add IAM policy's which reference each other. Then if we Terraform destroyed the infrastructure it would pass but a subsequent Terraform apply would fail, why? Circular Dependency's! Terraform is unable to apply because neither policy can be deployed without the other existing first.

This only increases when we move to using multiple state files, for example

Imagine you are working on "Project" which reads in some outputs from the Auth Terraform state, as you develop auth/terraform.tfstate is likely already deployed out.

Much like in the case above, your terraform code will no doubt work, reading from /auth/terraform.tfstate, till you head towards production! Where you may learn that you have created a dependency on /auth/ to be deployed first!

It gets Worse! As your developing what if you happened to make a Route53 entry in the Project that you need to work with in Auth, seems fine that Auth could reference it. After all the terraform code applies when you run it, yet as you get to production you realise your stuck in a circular dependency you didn't know about! Auth cannot be deployed out because it needs the Route53 resource created in Project, yet Project cant be deployed because it depends on Auth!

Quotas

Now quotas is a funny one, if you develop anything like me you'll have your terraform running workspaces, which give us the ability to deploy entirely separate blocks of infrastructure with no reference to other workspaces, neat right? Well it does have one draw back, particularly with AWS in that sometimes you hit concurrency quotas for having multiple versions of a service for example;

In Cognito AWS by default only enable 4 Custom Domains, which means at most 4 Terraform Workspaces with their own custom domain!

Having 1 extra terraform workspace has proven time and time again to hit the quota and identify issues we may have expanding out infrastructure out!

Disaster Recovery Testing

"The cloud would never go down" Ive said it, you've said it! I've even had conversations with Disaster Recovery Managers who want me to prove that Eu-West-1 will never lose all 3 Availability Zones in 1 go, apparently it's considered antagonistic to respond with "If we have lost London we are having bigger issues"!

Anyway, one of the things I have seen alot of companies struggle to do is big bang their infrastructure, everything is a carefully managed apply onto existing states to upgrade the existing infrastructure, what this doesn't prove is if we lost everything could we rebuild it from the Terraform. In theory yes but how many people have taken a new account and tried to deploy their entire IaC setup? Id hazard a guess not many and not often is probably the answer, why would you? AWS would never lose a region right.

Regression and Rebuild To The Rescue

How?

Ill give a more technical explanation at the end of the article but the summarised version is in short we rebuild our entire infrastructure every night and on demand in our pull requests.

Why?

By big bang, from nothing, clean field deploying our entire IaC out we can identify any Circular Dependencies, as the code goes from nothing to everything any state references are checked, any IAM cross permissions are validated . Once the IaC is out we can then run our acceptance tests to make sure everything worked as we had anticipated. If all has passed we can comfortably say that;

We have no circular dependencies
We know all the prerequisites required to deploy out our IaC (eg Route53 records,SSM Parameters)
We know how to deploy our IaC from scratch
No recent changes have regressed
We are at least 1 version of the IaC away from some Quotas

This gives us the confidence that we can progress our changes towards Production

What next?

Well Ive always had this challenge with deploying Terraform to Production is often more an upgrade than a new deployment and we dont always test for that. I'm going to work on a process of deploying out a copy of Production (with representative data) then applying the latest version of our Terraform over the top as an upgrade to test that path, this should help tackle those "TERRAFORM WANTS TO DESTROY WHAT!" moments.

Okay The Technical Bit

So I've used a multitude of applications for CI/CD in my time as such I wrote the below in Bash, since it is pretty much always available in every application.

The below functions assume a few things

AWS Credentials are present as per normal Terraform
Contexts : the ability to apply in order multiple Terraform applys based on structure ( it goes through them 1 at a time, eg "Auth,project1,project2" allowing you to enforce your state tendencies)
Workspace : Set a workspace for this to run in
Backend Config : I use backend config files for a number of reasons
Destroy : We have a reverse of the apply so we clean up after ourselves.

Feel free to modify the code and remove elements you don't need or like, what matters is the concept of what it's doing. I run this in github actions so I can also run it on Pull Requests if I want to.



#!/bin/bash

set -e

Reverse function for destroying
reverse() {
? tac <(echo "$@" | tr ' ' '\n') | tr '\n' ' '
}


function regression_apply {


? local workspace="$1"
? local backendconfig="$2"
? local contexts="terraform"
? local gitroot=`git rev-parse --show-toplevel`
? export TF_WORKSPACE=$workspace
? echo "git root set as ${gitroot}"
? echo "$workspace is been updated"
? echo "$backendconfig is been used"
? echo -e "____________________\nTerraform Applying\n___________________"
? for context in $contexts
? do
? ? echo "working on ${context}"
? ? cd ${gitroot}/$context/
? ? ## init Terraform
? ? terraform init -backend-config=./backend_config/$backendconfig -input=false
? ? ## Create plan and export it
? ? terraform plan -input=false -out="tfplan_${context}" -var-file=./development.tfvars 
? ? ## Apply Terraform
? ? terraform apply "tfplan_${context}"
? ? ## Revert Directory for next loop


? ? cd ${gitroot}
? done
? echo -e "____________________\nTerraform Applied\n____________________"
}


function regression_destroy {
? local workspace="$1"
? local backendconfig="$2"
? local contexts="terraform"
? local gitroot=`git rev-parse --show-toplevel`
? export TF_WORKSPACE=$workspace
? echo "git root set as ${gitroot}"
? echo "$workspace is been updated"
? echo "$backendconfig is been used"
? echo -e "____________________\nTerraform Destroying\n_________________"
? for context in `reverse $contexts`
? do
? ? echo "working on ${context}"
? ? cd ${gitroot}/$context/
? ? ## init Terraform
? ? terraform init -backend-config=./backend_config/$backendconfig -input=false
? ? ## Create plan and export it
? ? terraform plan -destroy -input=false -out="tfplan_destroy_${context}" -var-file=./development.tfvars.hcl
? ? ## Apply Terraform
? ? terraform apply "tfplan_destroy_${context}"
? ? ## Revert Directory for next loop
? ? cd ${gitroot}
? done
? echo -e "____________________\nTerraform Destroyed\n__________________"
}h

Regression and Rebuild Testing Your Infrastructure As Code

Kurtis Lamb

Senior Backend Engineer at CCP Games

Terraform and IaC challenges

Circular dependencies

Quotas

领英推荐

Disaster Recovery Testing

Regression and Rebuild To The Rescue

How?

Why?

What next?

Okay The Technical Bit

更多精彩文章

社区洞察

其他会员也浏览了

Decoding Flaky Tests

Navigating Docker Entrypoint Script Issues: A Guide to Permissions and Best Practices

Approachable Release Automation

Streamline Your Development with a C++ CI/CD Pipeline

DON'TS from "The Missing Readme"

Jenkins + Docker: A beginner's guide to Docker and Jenkins to understand CI/CD

Running Docker Inside Docker: Unleash Containerization Power

How to leverage Pre-commits hooks with Terraform

Terraform and IaC challenges

Circular dependencies

Quotas

领英推荐

Disaster Recovery Testing

Regression and Rebuild To The Rescue

How?

Why?

What next?

Okay The Technical Bit

If You Want To Change The World, Start Off By Making Your Bed

2021年3月2日

Building Rockets

2020年8月27日

Rapidly Develop, Debug and Resolve Incidents using OODA Loop

2020年6月1日

Testing anything with SQL as the Destination !

2020年1月23日

Changing SQL Server Collation - IT CAN BE DONE!

2019年12月23日

Isn't It A Good Thing Standards Evolve?

2019年12月6日

SQL Server Availability Groups,Seeding Failures and How To Re-Trigger Seeding

2019年8月14日

Password Change Day

2019年4月17日

My First Year - Coming to a close

2018年12月5日

Automation of SQL Server Install

2018年10月24日

社区洞察

其他会员也浏览了

Decoding Flaky Tests

Navigating Docker Entrypoint Script Issues: A Guide to Permissions and Best Practices

Approachable Release Automation

Streamline Your Development with a C++ CI/CD Pipeline

DON'TS from "The Missing Readme"

Jenkins + Docker: A beginner's guide to Docker and Jenkins to understand CI/CD

Running Docker Inside Docker: Unleash Containerization Power

How to leverage Pre-commits hooks with Terraform