Canary Deployments Made Easy: A CI/CD Journey with GitHub Actions and Argo CD Rollouts

Canary Deployments Made Easy: A CI/CD Journey with GitHub Actions and Argo CD Rollouts

Traditional software deployment methods are full of problems.

They're often slow, prone to mistakes, and require much manual work, which can delay the release of your software, increase the likelihood of human errors, and make it hard to keep things consistent across different environments (like development, testing, and production).

This article will explore how a well-structured CI/CD pipeline powered by Argo Rollouts GitHub Actions, and Kubernetes (specifically AKS for container orchestration) can revolutionize your DevOps practices. By automating and optimizing your deployment process, you can achieve faster software releases, minimize errors, and boost the overall efficiency of your development pipeline.

Although this article focuses on Python, the fundamental CI/CD principles we discuss are relevant to various programming languages and microservice architectures.

The key is understanding the core philosophy behind CI/CD design and then customizing your pipeline to suit your specific project needs. So, whether you're working with Jest/Mocha for Node.js, xUnit/SpecFlow for .NET, or need to include a compilation step for languages like Java, Go, or C++, the core principles remain the same - even if the specific tools you use might differ.

Project Overview

Let's design a scenario together.

python flask application

We have a basic Python Flask application on a Kubernetes cluster (AKS). Users come to our application to get their daily cat dose. While analyzing user feedback gathered in user testing, we discovered that eight out of ten users found static cat images boring and said they would prefer dynamic images or GIFs. As a result, the product owner created a Jira ticket, and a Python developer on the team was assigned to implement the feature of displaying cat images dynamically.

While real-life issues may not always be as simple as we've already highlighted, we aim to practice the philosophy behind designing an automated and secure pipeline, regardless of the intricacy of the issues, languages, or tools.

First Step: From Local Development

For a better user experience, the Python developer implements a new feature to display cat images dynamically and opens a pull request. As a DevOps engineer, our part is to design an automated and secure pull request pipeline (actually, we will create three pipelines: one for open/updated, one for closed pull requests, and the last one will trigger automatically when we merge to main) and bump up the new version successfully, adopting GitOps principles.

Prerequisites

  • Development Cluster: A cluster for PR reviews.
  • Production Cluster: Prod Cluster runs a stable version of our application.
  • Image Registry: To store our images privately.

We have deployed a production and test cluster on AKS using Terraform for the prod cluster, deployed and installed Argo CD, Argo rollouts, Nginx Ingress Controller, and Cert Manager to encrypt the traffic to our application (Adding an A record for frontend.devtechops.dev pointing to Azure Load Balancer). We have also set some repository secrets like username, PAT, SSH private key, kubeconfig, and cat API with custom scripts and Taskfiles.

Tools Preference

Let's take a look at some tools we adopted and the alternatives.

  • Pipelines: GitHub Actions. However, you can migrate to tools like Dagger, Azure DevOps, and GitLab CI.
  • Temporary Cluster: For this blog, we used another AKS cluster. Some alternatives are Vcluster, KubeVirt, ideal for temporary environments like PR reviews.
  • GitOps: Argo CD, alternative options include Flux.
  • Production Cluster: AKS. You can also go with EKS GKE.

Image Registry: GitHub, alternatives are Docker or GitLab, etc. Ready, let's get to action!


high level design of the pipelines

A Rough Look at the Three Pipelines

We should have a system that we can trust and monitor. To achieve this, we have crafted 3 pipelines. Let's take a look at them closely:

  • pr-open.yaml: Triggers whenever someone creates or updates a pull request.
  • pr-closed.yaml: Triggers when we close/merge the pull request. It destroys the temporary cluster.
  • main.yaml: Runs when we close/merge the pull request, updating the release manifest file with the new image tag and promoting the new version of our app to production with a canary deployment strategy.

For each pipeline, Slack notifies us about the result.

Pull Request Pipeline

In the PR open pipeline, we perform several checks related to the functionality of the code:

  • Ensure the implemented feature works as intended.
  • Check for potential errors.
  • Ensure code coverage is at least 80% (can be adjusted based on organization standards).
  • Verify that sufficient tests cover the new functionality, contributing to code coverage.
  • Confirm that the new functionality integrates well with the rest of the application. If the new feature passes these tests, we scan our infrastructure, images, and Kubernetes manifest files. We containerize our application, authenticate to the release repository, and update manifest files with the fresh image tag.

Finally, we create a temporary Kubernetes preview environment with AKS for the pull request (and each future pull request) for review.

All three pipelines

Pull Request Pipeline Further Considerations

We can add more tests on the running cluster, such as:

  • Simulating real-world user workflows with E2E tests using tools like Cypress or Selenium.
  • Load/performance tests to check how the new functionality affects resource consumption (CPU, memory, network) on the cluster, using tools like JMeter, K6, or Locust or by simulating real-world traffic.
  • Performing Chaos Engineering tests to introduce failures such as node restarts or network disruptions and monitoring how our application and cluster react.
  • Integrate tools like Burp Suite or OWASP ZAP for Dynamic Application Security Testing (DAST) if the application is complex with many dependencies and has various functionalities.

However, DAST might be considered overkill for a small application like ours with a limited attack surface.

Deploying the New Version to the development Cluster

Once security scans, unit, and integration tests provide a good level of confidence in the functionality and stability of the new code, we deploy our app to a development cluster that lives as long as the pull request (PR) is open. This environment is ephemeral and only available for testing and reviewing.

At this step, you can run whichever tests you want to run.

The only test we will perform at this step is scanning our running cluster using Kubescan and uploading the results as HTML artifacts.

Now, the team members can review the application.

pr branch to be merged after successful checks

Once we merge the application, the `pr-close.yaml` and main.yaml pipelines trigger. `pr-close.yaml` deletes the development cluster and notifies admin that the pr is merged.

On the other hand, since we push some new code to the main line, main.yaml also triggers and promotes the new version to production.

Let's take a closer look at the main workflow.

Main Pipeline: Continuous Deployment

Once we merge the pull request, the main pipeline bumps up the version and deploys the new version to production.

In this step, we containerize our application, push it to the registry, and commit the updated tags to our release manifests. We don't interact with the production cluster directly; instead, Argo CD is in control and ensures that the desired state (our release repo with newly updated tags) matches the application running in production.

Slack notifications

Instead of directly promoting to prod, we use a canary deployment strategy to release code safely. First, we send 20% of traffic to a small canary group and then pause the process to monitor the group. If everything works well, we manually promote the new version to prod.

ro paused; manually promoted

Finally, the traffic increases gradually until no pod is left from the earlier version of the application, which means that the weight reaches 100% and the rollout is completed successfully.

weight 100%

If problems arise, Argo also allows easy rollback to a previous version in the history.

Best Practices for Secure CI/CD Pipelines: A GitOps Approach

Here are some best practices we should adopt while designing the CI/CD pipeline:

  • Clear PR Descriptions: Always explain changes in PR descriptions, including details on implemented security measures.
  • Local Development Checks: Encourage developers to run linters, unit tests, and security scans on their local machines throughout development.
  • Coding Practices & Static Analysis: Adopt good coding practices and secure coding principles. Integrate static analysis tools to improve code quality security posture and identify vulnerabilities early.
  • Security is Ongoing: Security is an essential part of the entire development lifecycle, not just an afterthought. Even simple applications deserve a high level of security attention.
  • GitOps Principles: Infrastructure and application configurations are stored as code within our Git repository, enabling declarative (not manual) deployments and automatic rollbacks with Argo.

Future Optimizations

Our pipeline paves the way for a more agile and efficient software delivery process, but additional avenues exist for continuous improvement.

  • Enhanced Testing: Depending on your project needs, consider adding more robust tests like Regression, Performance, API, and E2E testing...
  • Monitoring & Logging: Implement a comprehensive monitoring and logging stack for proactive issue identification and alerting using tools like ELK, Prometheus, Grafana, DataDog, New Relic, and Robusta.
  • Cost Optimization: Explore cost optimization strategies to ensure efficient resource utilization using tools like cluster autoscaler, Azure Container Insights, HPA, multi-tenancy options, kubecost, analyzing/monitoring important pod, control plane, node, and application metrics.
  • Code Caching & Dependency Pre-installation: Utilize code caching and dependency pre-installation techniques to speed up the build process.
  • Parallel Builds: For larger projects, consider implementing parallel builds to improve pipeline execution.

Conclusion

From Local Feature to Production (Securely), here's a breakdown of what we accomplished with a secure GitOps-based CI/CD pipeline for our application:

  • We implemented a fully automated CI/CD pipeline leveraging GitOps principles, prioritizing security throughout the process.
  • Our only manual step was to promote the application to prod manually during the canary deployment.
  • All infrastructure and application configurations are stored as code within our Git repository.

As DevOps continues to evolve, leveraging these tools positions you for success in the ever-changing development landscape.

Take control of your DevOps workflow and feel the power of automation. Adopting the philosophy, not the tools, is important because they come and go.

Happy coding!


Interested in the nitty-gritty of the security tools?

Check out my comprehensive blog post on Kuberada.

Take a look at the brief overview of the Case Study on my website.

RUHINDA Benjamin

SysAdmin | DevOps | DevSecOps | Cloud

6 个月

Thanks for sharing this insightful article Gulcan T..

Kemal BA?AR

linopsae ?irketinde Web Developer

6 个月

Congratulations, really hard work.

Marcelo Grebois

? Infrastructure Engineer ? DevOps ? SRE ? MLOps ? AIOps ? Helping companies scale their platforms to an enterprise grade level

6 个月

That sounds like an efficient setup for your CI/CD pipeline! ? Let automation lead the way!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了