CI/CD Pipelines, how do they work?

CI/CD Pipelines, how do they work?

Part 2 - preventing prod fires

In another post i referenced the concept of DevOps and SRE - This falls within the overlap between Ops and Dev.

For this process we need a stack. For the sake of argument I'll use the following toolchain:

(This is my toolchain. There are many like it, but this one is mine.)

Build - Stage 1

At the start, for the sake of argument, we have a full-stack developer. In this context, we will take fullstack to include front-end, back-end, and infra. Our developer has written the application on their laptop, then dev account, where they have all the necessary tools installed for 90% of the work. They have written a PHP front-end, a mysql back-end, and now they need to deploy this to the cloud. The developer has also written terrform IaC because fullstack, and this needs to be applied also.

In this repo we have set up the process to run the builds and tests in a branch where the developer has push access. This would typically be done in the dev or staging environments.

The developer pushes their branch to the prod repo, and the pipeline which is defind in e.g. .gitlab-ci.yml where the entire process is kicked off. They then watch the pipline status page and log output as the pipeline does its thing:

  • It builds the container
  • If there is an error, the build process fails, you can view the output in the web interface
  • The developer then reads the error, fixes whatever is needing to be fixed and repeats the process until the build stage passes
  • it pushes the container to the preferred storage
  • This concludes the build process stage
  • !! all of this happens in the MR and without approvals being required yet - devs need to be able to do what they need to do with as little hindrance as possible - this means removing bottlenecks en enabling the devs to actually do their own stuff in a sandbox without fear of inadvertently breaking something.


Test - Stage 2

After the build stage comes the validation or test stage:

  • the container is pulled from the repo it's in
  • the container(s) run(s) in a staging env
  • end-to-end testing is applied against the app and the container
  • Load testing is performed
  • terraform enters the chat
  • The developer has passing decent knowledge of AWS or other cloud services, and can write terraform to deploy these services without too much hassle.
  • before "terraform plan" is run, tf also needs to have some security and consistency checks
  • chekov.io is one of many other tools that check terraform config validity based on the requirements you gave it in the config such as, check that all 8 cost allocation tags are present, check for creds, limit EC2 size to a specific level
  • the terraform plan stage will test 90% of the functionality, barring global things like S3 bucket names and IAM policies - these will only fail or succeed during the actual deploy since because it has global names, you can't test the setup 100% without using different names for the S3 buckets, or IAM policies are validated at runtime.
  • All these stages need to pass before moving on. the same as the first stage applies, at any point during the testing, if a test fails, the build fails and is stopped. the logs are read, the errors fixed and the proess starts over again


I mentioned DevOps earlier

This entire process is a collaboration between devs and ops. Traditionally from history, the sysadmins/cloud engineers would be looking after the hardware, disk space, memory usage and other internal system metrics and acting accordingly. The Platform team will likely be looking at running layer 7 on top of whatever. The SREs are basically the unit test for layer 8 - humans assisting other humans. when errors pop up during the build, test, or deploy stages, both SRE and developer work together like in a call or async, where both of them have some exposure to the other side. Dev-background SREs fare a little worse off in this scenario as deep infra knowledge might be lacking.

This is what DevOps is - devs and ops collaborating and overlapping. SREs have the responsibility on their shoulders to actually implement and facilitate the entire process and collab.

Deploy - Stage 3

When deploying as said earlier, we'll use a canary release.

Now before this can happen we'll need to have in place an already-running prod application, ideally spread across many container, ideally being orchestrated by kubernetes, and ideally some sort of configurable load balancer is in place.

We will also need a fully set-up and automated process for catching errors, and the rest of the observability stack including logs, metrics, standby, and support.

We'll need custom logs and metrics being sent out from the application in accordance to what the app should do and how it should perform and if it performing correctly.

The process starts by deploying the containerised application into whereever and configuring the load balancer to direct 1% of traffic to the new container.

Depending on your requirements, this could be automated with appropriate alerts or it could be a human-watched process

If successful, you start to roll out the new container and drain the old containers as the load is shifted.


Congrats! You've just DevOps'd :D





要查看或添加评论,请登录

Stefan Coetzee的更多文章

社区洞察

其他会员也浏览了