登录查看更多内容

Learn Kafka In Just 5 minutes

Shrey Batra

CEO @ Cosmocloud | Ex-LinkedIn | Angel Investor | MongoDB Champion | Book Author | Patent Holder (Distributed Algorithms)

发布日期: 2022年4月30日

The topic of Apache Kafka is hot these days...! But why is it so? What is Kafka, where do we use it and what's the benefits? Let's look at all this in a short 5 min article..! If you like my articles, do subscribe and share my newsletter with your friends..! ????

Yes, Kafka was originally made by LinkedIn, later open sourced as Apache Kafka.

Apache Kafka - The Elon Musk of Streaming Systems

Yup, you heard it right..! Apache Kafka is an open source, distributed "event streaming" platform (or datastore) used in high throughput systems. Let's break this in easy terms -

We all have heard about a queue in our Algo DS lectures, where we have a long array like pipe where we can put data from one end, and it pops out data from another end in the order we pushed. Simple FIFO - first in, first out.

Don't tell me it's that simple..!

Similarly, let's say we have a Kafka topic - a queue like type of database, in which we can insert data in an order (1, 2, 3...) and you can then read that data in the same order (1, 2, 3...). Let's just assume a single partition, very basic configuration just to start understanding.

Now, how is Kafka different from a simple Queue? Kafka persists data it has in its "queue like" structure, which means it actually stores it on Disk and not just the RAM, so that there is no data loss.

Why is this Queue like Datastore important?

Now that we know the (very very) basic working of Kafka, we can now say that a client can query or read the data from Kafka - it's a datastore right? Now, essentially you would think that you'd fire some SQL like query and can apply filters, but no. Let's come back to basics of queue system - you read the data one by one, in FIFO manner.

The difference to an ordinary queue vs Kafka is that you don't remove the data after you have read it. Instead, each consumer (or client) reading the data from Kafka maintain's an offset that says, how much data have I read and from where should I start reading it again.

Let's ignore "queue like" terminology and replace it with WAL - Write Ahead Log - meaning we write each event in an append only file.

领英推荐

Introduction to Apache Kafka

Brij kishore Pandey 9 个月前

A Comprehensive Overview Of Apache Kafka

InRhythm 2 年前

Kafka Simplified

Abhishek Gaddhyan 1 个月前

Multiple Clients on same Kafka Topic

Now, as we have data consistently stored in our Kafka Topic (or the queue like structure), multiple consumers (or clients) can read data from this topic, each maintaining its own offset. Now this is one of the most powerful feature of Kafka -

Using Kafka, you can push a single event to multiple services, as each service can read the same Kafka Topic.

Kafka as a Distributed System

Simply saying, how do you define a distributed system? And why? We define a distributed system, where we can break our data store / system into smaller, similar looking "chunks" or partitions, helping us to scale our system for huge volume of data or requests.

If you see our basic Kafka design, we maintain a single WAL store, which simple could be like an append only file (writing each event/message in a new line on file). Now, let's say your Producer - service pushing events into the system - produces 50 events every second, whereas your consumer - the service who reads the events and does computation - reads only 10 events per second.

As you'll see, with growing time, your consumer will start to lag behind and there would be a huge delay as when an event is pushed to Kafka, and when it is read. Hence, what is we can have multiple consumers reading from the Kafka queue, scaling our systems? But how do we know if one consumer has read the message, so that other consumer can read next? Remember FIFO logic..!!

To solve this, we break our single WAL (or queue like) system into multiple WAL (or queues) and distribute our events across each of this new partition (smaller queue). Each partition now behaves like a FIFO queue or WAL and we can have those many number of consumers reading from Kafka as the number of partitions..! (Max of 1 consumer per partition), making us break our original Kafka topic (50 messages per second) into 5 partitions (each having 10 msg/sec) and 5 consumers (each reading 10 msg/sec) from their own partition.

And that was Kafka in 5 minutes my friends..!!

Conclusion

Yes, there is lots of technical information and many more details in this, but the initial explanation could not be more simple than this..! Hope you liked my article and follow my newsletter to get notified with my articles..! Do like this post ????

You can now also download my Eazy Develop app, where you can read mine as well as hand picked articles from various awesome authors and tech blogs from companies like Google, LinkedIn, Meta, etc.

System Design & Architecture

49,156 位关注者

Vivek Hingorani

Engineering Manager - Java/Microservices/DevOps/Agile stack

1 年

Nice explanation.

Arpit Agarwal

Elastic Beanstalk SME. AWS PS Deployment CSE-2. 3x AWS Certified DevOps Professional

2 年

Love this

1 次回应

Tushar Sahu

2 年

This article is a gem. One of the simplest way to explain Kafka??

1 次回应

Parth Shah

Senior Software Engineer at Atlassian | Ex-Cisco | BITSian

2 年

Wonderful Article! I especially loved that part where you clearly explained how Kafka being a message broker is different from message queues.

2 次回应

Varun Madiyal

2 年

Genuine insights Shrey Batra ?????

1 次回应

查看更多评论

要查看或添加评论，请登录

Shrey Batra的更多文章

Instagram's trick for faster photo uploads and beat competition

2025年3月15日

Instagram's trick for faster photo uploads and beat competition

Instagram's Co-Founder Kevin Systrom, in an interview, shared how Instagram actually tricked users for greater…

8 条评论
How to break a system in Microservices - The invalid myths and the best practises

2025年1月31日

How to break a system in Microservices - The invalid myths and the best practises

People often think that 1 Microservice is responsible for 1 feature. And this is how you create the most inefficient…

5 条评论
How to be a SENIOR / STAFF engineer and highlight your impact?

2025年1月23日

How to be a SENIOR / STAFF engineer and highlight your impact?

How do you grow, other than learning new coding skills? You need much more to be a SENIOR engineer !! These concepts…

4 条评论
Cosmocloud Deploy - Managed Deployments cheaper than raw VMs / EC2

2025年1月20日

Cosmocloud Deploy - Managed Deployments cheaper than raw VMs / EC2

There was something cooking in Cosmocloud Labs, and finally it is out! Very happy to share that Cosmocloud Deploy is…
Using Redis as a Notification Service?

2024年12月16日

Using Redis as a Notification Service?

Only with Production Experience you can know that Redis can also be used as a notification system between multiple…

6 条评论
E03 - Finding the best Devops & PaaS Platforms - Azure App Services / Container Apps

2024年12月1日

E03 - Finding the best Devops & PaaS Platforms - Azure App Services / Container Apps

Under the new series of "Devops & PaaS Platforms", I am evaluating various different platforms on how easy it is to…

5 条评论
E02 - Finding the best Devops & PaaS Platforms - AWS ECS

2024年10月24日

E02 - Finding the best Devops & PaaS Platforms - AWS ECS

Under the new series of "Devops & PaaS Platforms", I am evaluating various different platforms on how easy it is to…

4 条评论
SMILe and Cosmocloud partners together: Transforming Logistics with Tech-Driven Operations

2024年9月19日

SMILe and Cosmocloud partners together: Transforming Logistics with Tech-Driven Operations

In today's fast paced world, technology has become the backbone of successful logistics operations, and at SMILe, we…

5 条评论
Building a Custom Link-Clicks Tracking System

2024年8月23日

Building a Custom Link-Clicks Tracking System

Last blog we saw how to create your own Event Tracking System, where we saw how we can track our own Page Views and…
Databases & Platform Mentorship Program

2024年8月21日

Databases & Platform Mentorship Program

Program Overview This exclusive Databases Mentorship Program will be a Hands-On Guided Mentorship and learning program…

1 条评论

See all articles

Learn Kafka In Just 5 minutes

Shrey Batra

CEO @ Cosmocloud | Ex-LinkedIn | Angel Investor | MongoDB Champion | Book Author | Patent Holder (Distributed Algorithms)

Apache Kafka - The Elon Musk of Streaming Systems

Why is this Queue like Datastore important?

领英推荐

Multiple Clients on same Kafka Topic

Kafka as a Distributed System

Conclusion

System Design & Architecture

49,156 位关注者

Shrey Batra的更多文章

社区洞察

其他会员也浏览了

002 – March 2023

--- Apache Kafka vs Solace PubSub+: A Comprehensive Guide for Modern Messaging Systems

Apache Kafka: Core Concepts and Use Cases

Apache KAFKA Connect 101 - Part (1/2)

Advanced Concepts in Apache Kafka

ZERO to HERO in 5 minutes in Apache KAFKA

?? Apache Kafka Internals-Part1

Leverage Replacing MergeTree for Real-Time PostgreSQL to ClickHouse Sync Using Kafka & Debezium | Hands-On Lab

Understanding Apache Kafka: A Detailed Guide

Kafka and Kafka Connect

Apache Kafka - The Elon Musk of Streaming Systems

Why is this Queue like Datastore important?

领英推荐

Multiple Clients on same Kafka Topic

Kafka as a Distributed System

Conclusion

System Design & Architecture

49,156 位关注者

Shrey Batra的更多文章

Instagram's trick for faster photo uploads and beat competition

How to break a system in Microservices - The invalid myths and the best practises

How to be a SENIOR / STAFF engineer and highlight your impact?

Cosmocloud Deploy - Managed Deployments cheaper than raw VMs / EC2

Using Redis as a Notification Service?

E03 - Finding the best Devops & PaaS Platforms - Azure App Services / Container Apps

E02 - Finding the best Devops & PaaS Platforms - AWS ECS

SMILe and Cosmocloud partners together: Transforming Logistics with Tech-Driven Operations

Building a Custom Link-Clicks Tracking System

Databases & Platform Mentorship Program

社区洞察

其他会员也浏览了

002 – March 2023

--- Apache Kafka vs Solace PubSub+: A Comprehensive Guide for Modern Messaging Systems

Apache Kafka: Core Concepts and Use Cases

Apache KAFKA Connect 101 - Part (1/2)

Advanced Concepts in Apache Kafka

ZERO to HERO in 5 minutes in Apache KAFKA

?? Apache Kafka Internals-Part1

Leverage Replacing MergeTree for Real-Time PostgreSQL to ClickHouse Sync Using Kafka & Debezium | Hands-On Lab

Understanding Apache Kafka: A Detailed Guide

Kafka and Kafka Connect