How Confluent acquired WarpStream for $220m after just 13 months of operation
In August of 2023, WarpStream shook up the Kafka industry by announcing a novel Kafka-API compatible cloud-native implementation that used no disks.
Instead? It used S3.
It was a viral HackerNews post named “Kafka is Dead, Long Live Kafka!”.
Just a year and month later - on September 9, 2024 - Confluent acquired them for $220M (!).
Why did they do that?
WarpStream’s innovative architecture gave them two major advantages that nobody could compete with:
The only drawback?
Latency was high.
? p99 for writes is 400ms ??
? p99 e2e (from write to read) is 1 second ??
Since WarpStream writes directly to S3 and also has to buffer writes so S3 PUT costs don't explode - it suffers from latency.
This wasn’t a problem. They had one key finding. ??
Kafka workloads are either:
It’s precisely the high volume workloads that cost a fortune.
Customers were happy to make the trade off - increase latency, but save costs.
The cost savings were indeed the juicy part:
Here I compare a Kafka cluster with:
using retail AWS prices.
WarpStream is fundamentally:
We're talking a ~$500k cost versus $2M.
The right architecture can be the difference between millions of dollars a year in infrastructure costs.
Like this in-depth cost analysis so far?
A lot more is to come. Make sure to follow me on all platforms to not miss a beat:
Where do WarpStream's savings come from?
?? Network costs.
The cross-zone charges you sustain from a regular Kafka deployment are the largest expense in a high-volume deployment.
They can be 80% of the total cost!
领英推荐
As you can see, even in the optimized deployment here, we have major charges from:
A less-optimized Kafka can cost you more than twice - $5.2M/yr.
A large chunk coming from EBS disks (no tiered storage) and consumer networking.
Anyway. If you optimize Kafka as much as possible, you’ll get to a ~$2.1M annual cost.
Out of that - $1.68M (80%) are UNAVOIDABLE network costs.
But WarpStream avoided them. ??
WarpStream drives all these costs to zero.
They offered it through a hybrid BYOC model, where:
How?
This is where the operational simplicity comes in.
The secret sauce is the control plane. ??
All the complex logic lives inside the control plane.
It’s essentially a sequencer that leverages DynamoDB and gives each agent the offset for each partition it wants to write.
This lets them keep the agents dumb and stateless. Their write flow is roughy:
All the state they get from the control plane, and the complex offset synchronization happens there.
Agents are therefore allowed to scale up and down effortlessly, like nginx.
Ingenious design. Literally worth hundreds of millions of dollars. All within a year.
Kudos to the team ??
Liked this article?
It took me hours to research and write.
I ask you for one favor (it'll take you 2 seconds) ??
??
Head of Software Development at Mandatum
4 个月Cool implementation! These guys are really on to something.
Software Engineer II at Workday | ex-Flipkart
4 个月Amazing design choices, and a really smart move by Confluent
Cloud Data / AI Engineering Lead
4 个月Great post!! #AutoMQ is another open-source similar to #wrapstream, it's a cloud-native alternative to #ApacheKafka that enables data persistence directly to S3. I appreciate its clever tweaks to the storage layer, achieving significant cost benefits with only a modest increase in latency to the 10s ms range. It also maintains strong compatibility with most of the Kafka API.
Senior Data Reliability Engineer @ Feedzai
4 个月Nicolas Takashi ??
Engineer | Ex-AWS
4 个月The story struck me - WarpStream traded latency for huge cost savings by leveraging S3. Smart design, fantastic results, and an inspiring story!