Building Pull Request based ephemeral Preview environments on Kubernetes
Air traffic control tower - Preview planes before they land or take-off

Building Pull Request based ephemeral Preview environments on Kubernetes

A CTO of a company calls you. They just migrated from Heroku to AWS on EKS. He's happy with the migration but wants you to build Heroku's "Ephemeral Preview Apps" on Kubernetes.

You know you can use ArgoCD for this, but you're in for some surprises and complications!

He wants to build ephemeral preview apps for both frontend and backend repos.

  1. Frontend repos are simple Single Page Apps using Vue.js
  2. Backend repos are Python+Django and use PostgreSQL, Redis, and MongoDB.

He lists down some more asks, which complicate things a bit.

He wants you to handle:

  • dependency management for services
  • database migrations & seed data for backend services
  • automated deletion of envs to save costs
  • integration with Jira and GitHub Deployments
  • and much more

All this while using existing tools as much as possible.

Your Documentation-driven approach

You sign up for this work and start creating a doc listing all requirements and identifying the unknowns. You've built preview environments before. However, handling dependency management, database migrations, seed data, etc., often requires custom solutions as it is contextual. Couple with that, some constraints to use existing tools, and now you have some interesting engineering work!

You list down existing tools and processes. They are:

  • Kustomize
  • GitHub Actions for CI
  • ArgoCD
  • Versioned DB scripts
  • AWS Secrets Manager for secrets, etc.

The next step is to try out some POCs to convert the "known unknowns" into "knowns". You know that the preview envs can easily be created for the frontend repos. For backend apps, you'll need to find out answers to some questions.

  1. Do we create PostgreSQL, Redis, MongoDB for each PR, or can these be shared?
  2. Do we need the ability to point a preview service to another preview service? Or does it always point to staging env?

You'll need to design the system based on answers to these and other questions.

So you do the grunt work, write down all questions, discuss the trade-offs with the CTO and other engineering leads, and finally, you come up with a solution that handles all these cases. Getting to this solution requires some POCs, trial and error, but it's part of the process.

Ephemeral PR based Preview environment workflow

Here's how you design the workflow.

Workflow for Ephemeral Preview Environments on Kubernetes

  1. A Developer creates a "Preview" labeled PR
  2. Start CI workflow
  3. ArgoCD watch the PR
  4. ArgoCD creates application deployment in K8s
  5. Preview env public endpoint is made available to devs and QA
  6. ArgoCD deletes the resources when PR is merged

This flow works well for both frontend and backend repos.

Challenges

Here are three main challenges you handle along the way:

  1. Seed data management
  2. Dependency management for services
  3. Keeping costs low for the Preview environments

Let's expand on the challenges further.

  1. Seed data management

You create a custom PostgreSQL image already loaded with seed data. This seed data is version-controlled in Git. That way, devs can easily update the PostgreSQL image when some new data needs to be loaded.

  1. Dependency management for services

You run the database containers in the same preview namespace for each PR. Thus, they are isolated from other PRs. By default, service A's PR will point to service B's staging env (if service A depends on service B) but can be easily overridden by devs by a config change.

  1. Keeping costs low for the Preview environments

To keep Preview env costs in check, you suggest running it on Spot instances. Obviously, you're also deleting all resources of the preview environment if it's not actively being used. This concludes your work. The CTO is super happy and wants to work with you further.


Are you such a CTO or engineering leader looking to supercharge developer productivity?

If you're looking for a reliable engineering partner for all things Infra, DevOps, Observability, and Reliability, DM me.

We do Pragmatic Software Engineering - on Production. That's it!

I write such stories on software engineering.

There's no specific frequency, as I don't make up these.

If you liked this one, you might love - https://www.dhirubhai.net/pulse/taming-gcp-networking-cloud-costs-chinmay-naik/

Follow me - Chinmay Naik for more such stuff, straight from the production oven!


Ian Sherwood

Manager, Software Engineering @ Disney Ad Platform ?? | Founding Member of Latinx in Tech ERG @ Disney ????

1 年

Heroku straight to k8s sounds like a headache waiting to happen! ??

Syed Mujtaba

Software Engineer | MS CS @ UC San Diego | GSoC '24 @ SPCL | ICPC Regionalist | Ex SDE-2 @ Trilogy | AWS, Go, Python, Kubernetes, Distributed Systems, Cloud, Serverless, GenAI

1 年

For Keeping costs low for the Preview environments: One could consider using KEDA or Knative for Event driven autoscaling but that goes against the CTO's requirement stated in the article: "using existing tools as much as possible." For spot instances: The developers might get frustrated if a preview pod goes down when they're testing it

Harshit Luthra

Infrastructure Magician at TrueFoundry | CK{A,SS} | Ex-Kutumb/Crafto, Ex-smallcase

1 年

Excellent Execution Chinmay Naik ?? ?? ?? you've made a very complex thing easy to understand. thanks for sharing your journey

要查看或添加评论,请登录

社区洞察

其他会员也浏览了