登录查看更多内容

How do Dams, Lakes, Spillways and Fish Lifts Help Explain Apache Kafka?

Paul Brebner

Open Source Technology Evangelist at Instaclustr by NetApp

发布日期: 2024年1月8日

When travelling for work or holidays I sometimes come across things (systems, machines, etc) that illustrate some aspects of distributed systems (E.g. The Great Wall of China last year). During the last holidays I went on a remote kayaking trip on Lake Yarrunga - here's a photo of our campsite:

Monarch campsite, Lake Yarrunga (Paul Brebner)

It's a very scenic but remote location for kayaking and camping - the large (10 square km) lake was formed by the Tallowa Dam, which was built in the 1970s. The dam is currently at full capacity, around 90,000 megalitres. This gives us our first computing concept, storage capacity! Databases and some streaming systems have the ability to store data. But the capacity of any physical system is limited so there needs to be a way to free up storage eventually. For example, in Apache Kafka, the data stored in topics has a default retention time (7 days), after which the "log cleaner" removes records that are older than 7 days. This is time-based retention. But what happens when your storage hits a space threshold limit?

Don't go over the spillway! (Paul Brebner)

Kayaking close to the dam was a bit scary and loud, as the lake just disappears over the edge of a "waterfall" = spillway. This spillway is a central overflow type and extends most of the way along the 500m dam wall. It was working at full capacity when I was there and has a capacity of 28,000 cubic metres a second (11 Olympic swimming pools per second! At this rate the lake could be completely drained in an hour). Apache Kafka has another log retention policy called "Sized Based Retention". However, this is somewhat tricky to set up as you need to take into account the number of topics, partitions per topic, and the possibility that topics and partitions grow at different rates etc. However, it is one of the modifiable Kafka broker settings for Instaclustr's managed Kafka service.

But the spillway also illustrates the main feature of Kafka, fast-flowing data streaming! Here's a photo I took (after the trip) of the water exiting the spillway on the downstream side of the dam.

The Tallowa dam is actually part of the Sydney water catchment system, and is the last reservoir in the system - water is pumped up to the next reservoir etc until it reaches Sydney, but the pumping capacity out of this dam is only 60 cubic metres a second (but is "streaming", so you can imagine that the pump is a separate Kafka consumer group).

While checking out the downstream side of the dam I noticed this rather strange contraption!

The Tallowa Dam Fish Lift - the bucket is at the top right of the photo (Paul Brebner)

What is it? It's a "Fish Lift" to help fish migrate upstream - i.e. over the dam wall and into the lake. The bucket has a capacity of 2,500 litres per hour (it takes an hour to go up and back - once, so is definitely not an example of "streaming" but a batch process!). It is possible to build Kafka applications that combine batch/scheduled and streaming technologies - for example, in my recent "Spinning Your Drones" series I show how to combine Uber's Cadence with Kafka.

Here's a table summarising what we discovered about this dam "system":

领英推荐

GIS Newsletter September 2024

MicroCenter Gulf 5 个月前

Fundamentals of GIS: GIS Explained

Felt 9 个月前

Esri: Pioneering the World of Geographic Information…

Esri Saudi Arabia 1 年前

Summary of the Dam systems, volumes/rates and type.

At these rates, the dam could be emptied in under an hour by the Spillway, but it would take 17 days by pumping, and the fish lift (assuming it could be run in reverse) would take "forever" > 4,000 years!

What's the theory behind the performance and scalability of systems like Dams, spillways, pumps and Apache Kafka? An important concept is "Little's Law" (Users = Throughput x Latency), I've used this a few times in previous blogs and talks - e.g. here and here.

Surprisingly (for an Australian bush experience) I survived this trip unscathed (i.e. not attacked by wild animals, or sucked over the spillway etc). But after arriving back at the relative safety of the dam I encountered this large Goanna devouring a fish - yum.

An Australian Goanna/Water Monitor (Paul Brebner)

Postscript

This is certainly not the first example of using "water" to illustrate or even model other systems. The MONIAC (Monetary National Income Analogue Computer) is an early example, also called the Phillips Machine after another Kiwi, Bill Phillips, modelled the national economy of the UK in 1949 using tanks, pipes and taps! Here's a demo of the only MONIAC in NZ.

要查看或添加评论，请登录

Paul Brebner的更多文章

Load Testing - of a bridge, by lots of trains!

2025年3月3日

Load Testing - of a bridge, by lots of trains!

Finally, an opportunity to combine software performance engineering with trains in a way that's not too far-fetched! I…
Three decades of laptop computers

2025年2月23日

Three decades of laptop computers

I was tidying up the garage on the weekend and came across a stack of old laptops that I've been "accidentally"…

1 条评论
Open Source Performance Engineering: Blogs – Part 1

2025年2月19日

Open Source Performance Engineering: Blogs – Part 1

I recently needed to track down and summarise some of my Performance Engineering blogs (covering performance…
20 years of Open Source from Grid to Cloud Computing

2024年12月17日

20 years of Open Source from Grid to Cloud Computing

Given that it's coming to the end of 2024 I was thinking back to what I was up to 20 years ago, in 2004. That feels…
Kafka Connect: Build and Run Data Pipelines - Book Review, Paul Brebner

2024年11月22日

Kafka Connect: Build and Run Data Pipelines - Book Review, Paul Brebner

Kafka Connect: Build and Run Data Pipelines, by Mickael Maison and Kate Stanley, O'Reilly September 2023, 400 pages. I…

2 条评论
Summary of the 6th Community over Code Performance Engineering Track (October 7, 2024, Denver, Colorado, USA)

2024年10月23日

Summary of the 6th Community over Code Performance Engineering Track (October 7, 2024, Denver, Colorado, USA)

After much anticipation, the 6th Community over Code Performance Engineering track was held on October 7 2024 in…

2 条评论
Seven Years of Open Source DevRel Technology Fun With Instaclustr

2024年8月6日

Seven Years of Open Source DevRel Technology Fun With Instaclustr

Seven years ago tomorrow I joined Instaclustr as the first Technology Evangelist to help explain multiple open source…

4 条评论
The Fourth Community over Code Performance Engineering Track (Bratislava, Slovakia, 5 June 2024)

2024年6月17日

The Fourth Community over Code Performance Engineering Track (Bratislava, Slovakia, 5 June 2024)

The 4th Community over Code Performance Engineering track was on recently in Bratislava. Thanks to everyone who made it…
Kafka Summit Bangalore 2024 - Interesting Talks

2024年5月9日

Kafka Summit Bangalore 2024 - Interesting Talks

Last week I attended the Apache Kafka Summit Bangalore (India, along with thousands of other speakers and attendees -…
What Do Hanoi Intersections And Water Puppets Have In Common With Distributed Cloud Systems?

2024年4月22日

What Do Hanoi Intersections And Water Puppets Have In Common With Distributed Cloud Systems?

Last week I presented at FOSSASIA which was held in Hanoi, Vietnam. During my time in Hanoi, I had two experiences that…

3 条评论

See all articles

How do Dams, Lakes, Spillways and Fish Lifts Help Explain Apache Kafka?

Paul Brebner

Open Source Technology Evangelist at Instaclustr by NetApp

领英推荐

Paul Brebner的更多文章

社区洞察

其他会员也浏览了

Geospatial Solutions Redefining Our World

Now Hiring - GIS Data Analysts/Cartographers!

What is the difference between GIS and ArcGIS?

Geocgi Receives Esri’s Integration Award at Esri Partner Conference for Exceptional Achievement

Why Learn ArcGIS Pro?

Mapping the Path Ahead: How Geographic Information Systems are Evolving

Mapping the Path Ahead: How Geographic Information Systems are Evolving

SuperMap Solution of the Search and Rescue System

How to Implement GIS in National Search and Rescue System

ArcGIS Field Maps: Structuring the Perfect Map for Field Use

领英推荐

Paul Brebner的更多文章

Load Testing - of a bridge, by lots of trains!

Three decades of laptop computers

Open Source Performance Engineering: Blogs – Part 1

20 years of Open Source from Grid to Cloud Computing

Kafka Connect: Build and Run Data Pipelines - Book Review, Paul Brebner

Summary of the 6th Community over Code Performance Engineering Track (October 7, 2024, Denver, Colorado, USA)

Seven Years of Open Source DevRel Technology Fun With Instaclustr

The Fourth Community over Code Performance Engineering Track (Bratislava, Slovakia, 5 June 2024)

Kafka Summit Bangalore 2024 - Interesting Talks

What Do Hanoi Intersections And Water Puppets Have In Common With Distributed Cloud Systems?

社区洞察

其他会员也浏览了

Geospatial Solutions Redefining Our World

Now Hiring - GIS Data Analysts/Cartographers!

What is the difference between GIS and ArcGIS?

Geocgi Receives Esri’s Integration Award at Esri Partner Conference for Exceptional Achievement

Why Learn ArcGIS Pro?

Mapping the Path Ahead: How Geographic Information Systems are Evolving

Mapping the Path Ahead: How Geographic Information Systems are Evolving

SuperMap Solution of the Search and Rescue System

How to Implement GIS in National Search and Rescue System

ArcGIS Field Maps: Structuring the Perfect Map for Field Use