Which Kubernetes Service Should I Use For My Startup?

Which Kubernetes Service Should I Use For My Startup?

TL;DR: don’t use Kubernetes for your startup

Introduction

Let me start by saying that my actual experience with Kubernetes is relatively Limited. I’ve read a couple of books and done some exercises launching an EKS cluster and some exercises installing kubernetes and using kubectl. That’s about it.?

However, I think I have a good conceptual understanding of why, for instance, AWS doesn’t really build anything in Kubernetes, and neither does Noom. I think I have a good conceptual and practical understanding of the tradeoffs involved in microservices architectures, autoscaling architectures, and containerization. In this post I explain my thinking around why you probably shouldn’t use Kubernetes for your startup - it’s like driving your kids to school in an 18-wheeler. I’ll even make the case for Monoliths (as long as they are stateless).

A High Level Conceptual Overview of Kubernetes

Feel free to skip this if you already know how Kubernetes works.

Kubernetes has a ton of complexity but it has three components that we need to know in order to even discuss it at all:

  • Master node: this node is the one that actually acts as a registry for all the containers in the fleet. It manages ensuring that the desired state of the whole cluster (eg. I want to have 3 pods with microservice x and 4 pods with microservice y) matches the actual state. It does this by doing health checks, maintaining a registry, and spinning up new pods when necessary. And other cool wizardry.
  • Worker node: this is a VM where the actual containers live. This is where your web application actually runs.
  • Pod: The kubernetes term for “container”. This is what your application code actually runs on.

The Awesomeness in Kubernetes

One of the great advantages of containers over using bare VM’s is that containers launch fast. Blazing fast. VM’s launch slow. Therefore, when you get a spike in traffic, hosting your web application on kubernetes means that you have a much greater probability of being able to respond to a traffic spike without dropping requests. This is great, isn’t it?

Not so fast.

In order for a container to actually launch fast, this means that your worker nodes have excess capacity. If your worker nodes are near capacity, then you need to find a way to launch a new worker node yourself (or have a managed kubernetes service do it for you). Then and only then, can you benefit from the fast container start time.

This means that if you’re using Kubernetes, you are always running excess compute capacity.

So now I ask you, if you always need to run excess capacity anyway, and you need a way to autoscale your VM’s anyway, what if….

Stay with me…

What if you skipped Kubernetes entirely?

What if you ran your application code directly on the VMs?

Since you’re running excess capacity anyway, you can just set more conservative autoscaling targets (say, scale up at 50% CPU utilization instead of 70%). This should make up for the slower VM start time.

VM-based Microservices

You can have a microservices architecture without containers. The only difference is that each microservice will manage its own autoscaling infrastructure. Each microservice will be behind its own autoscaling group.

Once you set one of these up as infrastructure as code, it’s trivial to parametrize it in a way that allows you to use the same cloudformation/terraform template to quickly spin up more and more microservices. Many big companies build their systems this way.

Under this system, you just use the accepted EC2 microservice CFN template and it comes out of the gate with your autoscaling group, etc. You run the microservice code directly on VMs. All the VM’s run in the VPC for that region. If you have the right fleet size you get multi-availability zone resilience without even thinking about it.

The advantages of having all your microservices on a common underlying compute platform

Of course, by now astute readers will have noticed that if you use VM-based microservices, you’re going to be wasting a hell of a lot more capacity. Each microservice runs on excess capacity. Since each microservice needs more buffer capacity, this quickly adds up.?

In a large company like Amazon, this actually costs them a lot of money. They have thousands of services all running at 20-50% capacity. If, instead, each org had a kubernetes setup to share compute across all its services, they would be able to save a lot of money and run on significantly less excess capacity.

Yet they don’t do it. The reason why they don’t do it is well beyond my pay grade but I can hypothesize that they don’t do it because the cost savings are too small compared to the complexity of the project. The payback period of such an endeavor would probably be over 10 years.

The difficulties of microservices

It is my contention that absolutely none of the advantages of microservices apply to a greenfield/young app for a small company. When I launch a stack for a small company, the staging environment uses an autoscaling group with a single t2.small EC2 instance, and the prod environment uses an ASG (from the same CFN template) using two t2.small instances.

BTW the cost of the auto-scaling VM setup I mentioned in the above paragraph comes out to around $30 a month (less if you’re still in free tier and wanna use t2.micro for staging, but those deployments are slow as hell).

Only when I launch a whole new app for that company, do I create another service. And even then, I might not. Why?

Because none of the biggest benefits of microservices are meaningful to a company getting less than 100 peak TPS (around 8.6 million requests per day if it were sustained, more like 2m a day in practice). That's right, if the choice is between Kubernetes and VM's, I'm picking VM's. Kubernetes is useless if there are no microservices, and microservices are useless for most small companies and almost all greenfield projects.

Most startups will get like 100 requests per DAY.?

If they’re in the top 1% of success they might get like 10k daily active users, or something like 50k requests per DAY. This is a peak TPS of about 1-1.5. You can run this out of a raspberry pi. Literally.

It makes zero sense to optimize for services scaling at different rates and for having different deployment cadences when none of them ever need to scale, and when it’s the same people working on every service. When you’re at AWS-level, yes, it makes a huge difference that your authn/authz service can scale at a different rate than your ec2 control plane services, which can also scale differently from ec2 data plane services. It also makes a huge difference that every team can own their deployment cycle.

That’s why they pay all the costs of microservices. What are these costs? In order of how difficult the challenges are (in my opinion):

1) A huge ol’ mess of code and code packages. Realistically, the chances that your consumer app breaks up neatly into microservices are essentially nil. At Noom, if we broke everything up into microservices, there would need to be a lot of DB-interaction code that would need to also be broken out into its own packages in order to be consumed by several microservices. At this point, when the company is 15 years old and has hundreds of API’s, a few places have emerged where we can easily break out a mini-service. Our biggest monolith can be broken up into up to 3 smaller monoliths without much difficulty, with no code duplication, without much chance of future code duplication, and in a way that would provide us many advantages without too many disadvantages. So we’re experimenting with doing it.

Having hundreds of code packages is something adds significant slowness and difficulty to your developers' daily job. The difficulties are tough to explain if you haven’t experienced them. But they’re there. Rolling out a breaking change to a DB repository code package is essentially impossible. Since nothing can ever be breaking, many DB interaction methods will need to be versioned (you can either version the functions themselves, or write spaghetti code inside the DB repo package saying stuff like “if version == x then do this, elif version == y do that, else else else”). This significantly increases the complexity of doing something as simple as adding a column to a database-layer model.

2) Without an observability solution (eg. adopting some service mesh technology, for instance), customer-facing operations are an un-debuggable black box and you will never fix bugs. A complex customer operation will enter your system and it will make 7 requests to other microservices and you’ll have a hell of a time tracing it and debugging why service X responded with 404 and resulted in an unhandled exception in your API. Service X handled the 404 just fine, so there’s no stack trace in logs. You can’t even know what the hell happened.

3) Complicated authn/authz model, or need for a centralized API/federated access layer. Since your client-side app (mobile or frontend) will be hitting a ton of endpoints, you either need to put all of these behind a federated access layer, or you will need to adopt a standard like OPA or OpenId in order for each request’s authn token to also have authz information encoded into it. Both of these things are super easy to get wrong and it’s extremely difficult to find out when you did get them wrong until you get hacked. It’s a latent risk that doesn’t affect the company’s day to day but could be catastrophic if the risk materializes.

3) More CI/CD effort. Deploying a monolith or two mini services is way easier than having to develop a catch-all CI-CD solution for all your code packages to land in pods.

But I Want Containers!

In the context of a monolith or a few services, K8s solves no problems at all.

In the context of microservices, the primary problem K8s solves, which VM-based solutions can’t tackle, is the problem of minimizing infrastructure waste.

You might think that your business is going to be such a massive success that you want an exit plan. Containerizing a large mesh of VM services is prohibitively difficult, and if you go with the VM solution, you’re basically locked into infrastructure waste forever.

And you’d be right. So we should think of a way to make containers work for us without adding a bunch of complexity and operational overhead.

Another fantastic point for containers is that they often provide a great developer experience. They eliminate a whole class of “Works on my machine” issues. They force you and your engineers to never do anything stateful. Think about companies acquiring a bunch of tech debt little by little after accruing instances of the old “oh let’s just write to a local file. It’s fine for this use case”.

Also, the actual cost of containerizing a stateless application is quite low, especially if you do it from the get-go.

Conclusion: Use Containers, Not Kubernetes

So, running containerized has a bunch of benefits and costs little in terms of time. The question is, is there a way to run containerized services that doesn’t add a bunch of additional complexity like Kubernetes does?

Absolutely. I’m aware of a couple such options which means there’s probably like 15 options. My recommendation as an AWS fanboy, a past AWS employee and future AWS employee: use AWS ECS Fargate. The underlying dynamics of the container control plane and scaling worker nodes are completely abstracted away from you. It’s still container-based so you’re learning about Docker and launching services which are configured based on dockerfiles and possibly even using Docker for local development. It should be reasonably easy to migrate fargate containers to K8s, when it actually makes sense (fargate is pretty expensive but you can run effectively with little excess capacity if you invest the time to configure autoscaling).?

ECS is what the Noom infra team chose (they are amazing, btw) and it’s worked fantastically well for Noom.?

As always, no advice is good in all situations. I think if you’re running a tech team constrained by finite resources in a startup scenario where it’s important to run lean and move fast, Kubernetes should be out of the question in most cases.

Andrew Hamblin

Senior Software Engineer | Full-Stack Development | Agile Methodologies | Delivering 10x performance improvements ahead of schedule

2 年

It's much easier to build a modular "monolith" and put it on a single VPS or PAAS and break it out into services as needed to scale than to deal with that complexity when you're trying to get basic functionality right.

Following your Linkedin profile is kinda great learning library for young developers. Thanks for the article.

Irwansyah Irwansyah

Software Craftsman Product Engineer

2 年

STRONGLY AGREE with you! :)

要查看或添加评论,请登录

社区洞察

其他会员也浏览了