How do Dams, Lakes, Spillways and Fish Lifts Help Explain Apache Kafka?
A misty morning start to Kayaking on Lake Yarrunga

How do Dams, Lakes, Spillways and Fish Lifts Help Explain Apache Kafka?

When travelling for work or holidays I sometimes come across things (systems, machines, etc) that illustrate some aspects of distributed systems (E.g. The Great Wall of China last year). During the last holidays I went on a remote kayaking trip on Lake Yarrunga - here's a photo of our campsite:

Monarch campsite, Lake Yarrunga (Paul Brebner)

It's a very scenic but remote location for kayaking and camping - the large (10 square km) lake was formed by the Tallowa Dam, which was built in the 1970s. The dam is currently at full capacity, around 90,000 megalitres. This gives us our first computing concept, storage capacity! Databases and some streaming systems have the ability to store data. But the capacity of any physical system is limited so there needs to be a way to free up storage eventually. For example, in Apache Kafka, the data stored in topics has a default retention time (7 days), after which the "log cleaner" removes records that are older than 7 days. This is time-based retention. But what happens when your storage hits a space threshold limit?


Don't go over the spillway! (Paul Brebner)

Kayaking close to the dam was a bit scary and loud, as the lake just disappears over the edge of a "waterfall" = spillway. This spillway is a central overflow type and extends most of the way along the 500m dam wall. It was working at full capacity when I was there and has a capacity of 28,000 cubic metres a second (11 Olympic swimming pools per second! At this rate the lake could be completely drained in an hour). Apache Kafka has another log retention policy called "Sized Based Retention". However, this is somewhat tricky to set up as you need to take into account the number of topics, partitions per topic, and the possibility that topics and partitions grow at different rates etc. However, it is one of the modifiable Kafka broker settings for Instaclustr's managed Kafka service.


But the spillway also illustrates the main feature of Kafka, fast-flowing data streaming! Here's a photo I took (after the trip) of the water exiting the spillway on the downstream side of the dam.


Water from the spillway (Paul Brebner)

The Tallowa dam is actually part of the Sydney water catchment system, and is the last reservoir in the system - water is pumped up to the next reservoir etc until it reaches Sydney, but the pumping capacity out of this dam is only 60 cubic metres a second (but is "streaming", so you can imagine that the pump is a separate Kafka consumer group).

While checking out the downstream side of the dam I noticed this rather strange contraption!

The Tallowa Dam Fish Lift - the bucket is at the top right of the photo (Paul Brebner)


What is it? It's a "Fish Lift" to help fish migrate upstream - i.e. over the dam wall and into the lake. The bucket has a capacity of 2,500 litres per hour (it takes an hour to go up and back - once, so is definitely not an example of "streaming" but a batch process!). It is possible to build Kafka applications that combine batch/scheduled and streaming technologies - for example, in my recent "Spinning Your Drones" series I show how to combine Uber's Cadence with Kafka.

Here's a table summarising what we discovered about this dam "system":

Summary of the Dam systems, volumes/rates and type.

At these rates, the dam could be emptied in under an hour by the Spillway, but it would take 17 days by pumping, and the fish lift (assuming it could be run in reverse) would take "forever" > 4,000 years!

What's the theory behind the performance and scalability of systems like Dams, spillways, pumps and Apache Kafka? An important concept is "Little's Law" (Users = Throughput x Latency), I've used this a few times in previous blogs and talks - e.g. here and here.

Surprisingly (for an Australian bush experience) I survived this trip unscathed (i.e. not attacked by wild animals, or sucked over the spillway etc). But after arriving back at the relative safety of the dam I encountered this large Goanna devouring a fish - yum.

An Australian Goanna/Water Monitor (Paul Brebner)


Postscript

This is certainly not the first example of using "water" to illustrate or even model other systems. The MONIAC (Monetary National Income Analogue Computer) is an early example, also called the Phillips Machine after another Kiwi, Bill Phillips, modelled the national economy of the UK in 1949 using tanks, pipes and taps! Here's a demo of the only MONIAC in NZ.









要查看或添加评论,请登录

Paul Brebner的更多文章

社区洞察

其他会员也浏览了