Kafka Eco System

Arabinda Mohapatra

Pyspark, SnowFlake,AWS, Stored Procedure, Hadoop,Python,SQL,Airflow,Kakfa,IceBerg,DeltaLake,HIVE,BFSI,Telecom

发布日期: 2024年7月30日

+ 关注

??????????????:

????A stream of messages belonging to a particular category is called a Topic.

????Its is a logical feed name where to which records are published(Similar to Table in DB )

????Unique identification of table is called name of the topic - can not be duplicated

????A topic is a storage mechanism for a sequence of events

????Events are immutable

????keep events in the same order as they occur in time. So, each new event is always added to the end of the Message.?

2. ??????????????????????:

????Topics are split into partition

????All the messages within a partition are ordered and immutable

????All the messages within the partition has a unique ID associated is called OFFSET.

????Kafka uses topic partitioning to improve scalability.

????Kafka guarantees the order of the events within the same topic partition.?However, by default, it does not guarantee the order of events across all partitions.

3.??????????????????:

????Replicas are backs of partition

????Replicas are never read or write data

????They are used to prevent data loss (Fault - Tolerance)

4.??????????????????:

????Producers publish messages by appending to the end of a topic partition.

????Each message will be stored in the broker disk and will receive an offset (unique identifier). This offset is unique at the partition level, each partition has its owns offsets. That is one more reason that makes Kafka so special, it stores the messages in the disk (like a database, and in fact, Kafka is a database too) to be able to recover them later if necessary. Different from a messaging system, that the message is deleted after being consumed;

????The producers use the offset to read the messages, they read from the oldest to the newest. In case of consumer failure, when it recovers, will start reading from the last offset;

????By default, if a message contains a key (i.e. the key is NOT?null), the hashed value of the key is used to decide in which partition the message is stored.

????Producers publish messages by appending to the end of a topic partition. By default, if a message contains a key (i.e. the key is NOT null), the hashed value of the key is used to decide in which partition the message is stored.

????All messages with the same key will be stored in the same topic partition. This behavior is essential to ensure that messages for the same key are consumed and processed in order from the same topic partition.

????Producers write the messages to the topic level(All the partitions of that topic) or specific partition of the topic using Producing API's

????If the key is null, the producer behaves differently according to the Kafka version:

up to Kafka 2.3: a round-robin partitioner is used to balance the messages across all partitions
Kafka 2.4 or newer: a sticky partitioner is used which leads to larger batches and reduced latency and is particularly beneficial for very high throughput scenarios

Benefits of the Sticky Partitioner

Larger Batches: The sticky partitioner sends records to a single partition until the batch is full or a certain time has passed. This results in larger batches, which are more efficient to process1.

Reduced Latency: By sticking to a single partition for a period, the sticky partitioner reduces the overhead associated with switching partitions frequently. This leads to lower latency in message delivery1.

Improved Throughput: Larger batches and reduced latency together contribute to higher throughput, making the sticky partitioner particularly beneficial for applications with very high data volumes1.

5.Consumer:

????Consumers are applications which read/consume data from the topics within a cluster using consuming API's

????Consumers can read either on the topic level (All the partitions of the topic) or specific partition of the topics

????Each message published to a topic is delivered to a consumer that is subscribed to that topic.

????A consumer can read data from any position of the partition, and internally the position is stored as a pointer called offset. In most of the cases, a consumer advances its offset linearly, but it could be in any order, or starting from any given position.

????Each consumer belongs to a consumer group. A consumer group may consist of multiple consumer instances.

????This is the reason why a consumer group can be both, fault tolerant and scalable.

???? If one of several consumer instances in a group dies, the topic partitions are reassigned to other consumer instances such that the remaining ones continue to process messages form all partitions.

???? If a consumer group contains more than one consumer instance, each consumer will only receive messages from a subset of the partitions. When a consumer group only contains one consumer instance, this consumer is responsible for processing all messages of all topic partitions.

????Message consumption can be parallelized in a consumer group by adding more consumer instances to the group, up to the number of a topic’s partitions.

???? if a topic has 8 partitions, a consumer group can support up to 8 consumer instances which all consume in parallel, each from 1 topic partition.

????Adding more consumers in a consumer group than the number of partitions for a topic, then they will stay in an idle state, without getting any message

???? Consumer will pull the message by consumer.poll()

????max.pull.size is 15- in one instance consumer can pull up to 15 no of message at a instance

????consumer.commit() will help in committing

6.Kafka Broker:

????That Kafka broker is a program that runs on the Java Virtual Machine (Java version 11+)

????A Kafka broker is used to manage storing the data records/messages on the topic. It can be understood as the mediator between the two

????The Kafka broker is responsible for transferring the conversation that the publisher is pushing in the Kafka log commit and the subscriber shall be consuming these messages.

????Enabling the delivery of the data records/ message to process to the right consumer.

7. Kafka Cluster

????An ensemble of Kafka brokers working together is called a Kafka cluster. Some clusters may contain just one broker or others may contain three or potentially hundreds of brokers. Companies like Netflix and Uber run hundreds or thousands of Kafka brokers to handle their data.

????A broker in a cluster is identified by a unique numeric ID. In the figure below, the Kafka cluster is made up of three Kafka brokers.

????To configure the number of the partitions in each broker, we need to configure something called Replication Factor when creating a topic. Let’s say that we have three brokers in our cluster, a topic with three partitions and a Replication Factor of three, in that case, each broker will be responsible for one partition of the topic.

????As you can see in the above image, Topic_1 has three partitions, each broker is responsible for a partition of the topic, so, the Replication Factor of the Topic_1 is three.

????It’s very important that the number of the partitions match the number of the brokers, in this way, each broker will be responsible for a single partition of the topic

Replication Factor:

https://www.educba.com/kafka-replication/

In this scenario, we have a replication factor of 2, which means each partition has two copies for redundancy.

Let’s consider what happens if Broker 3 fails. Since Broker 3 was the leader for partition 3, the connection to that partition is lost. However, Kafka handles this by automatically selecting one of the in-sync replicas (in this case, there is only one remaining replica) to become the new leader for partition 3.
When Broker 3 comes back online, it can try to rejoin as the leader. Kafka maintains the in-sync replica (ISR) list for each partition by monitoring the latency of each replica.
When a producer sends a message, the leader broker writes it and replicates it to all in-sync replicas. A message is only considered committed once it has been successfully replicated to all in-sync replicas.
Producers Sent Message without Key :
Just like in the messaging world, Producers in Kafka are the ones who produce and send the messages to the topics.
As said before, the messages are sent in a round-robin way. Ex: Message 01 goes to partition 0 of Topic 1, and message 02 to partition 1 of the same topic. It means that we can’t guarantee that messages produced by the same producer will always be delivered to the same topic. We need to specify a key when sending the message, Kafka will generate a hash based on that key and will know what partition to deliver that message.
That hash takes into consideration the number of the partitions of the topic, that’s why that number cannot be changed when the topic is already created
In Kafka, acknowledgements (ACKs) are crucial for ensuring data reliability. Depending on your use case, you can choose from three types of ACK settings:
acks=0: The producer does not wait for any acknowledgement from the broker. While this setting provides the lowest latency, it also risks data loss as there's no guarantee that the message was received or replicated.
acks=1: The producer gets an acknowledgement after the leader broker receives the message. However, if the leader fails before the followers replicate the record, there's potential for partial data loss.
acks=all: The producer receives an acknowledgement only after all in-sync replicas have received the message. This setting provides the highest durability, ensuring no data loss, though it may introduce some latency.

https://static.javatpoint.com/tutorial/kafka/images/apache-kafka-producer5.png

Three Variations of Offset

LogEnd Offset: Offset of the last message to a log/partition
Current Offset: Pointer to the last record that kafka has already sent to the consumer in the most recent poll,will never commit offset
Committed Offset: Making an offset as consumed is called committing an offset

Kafka Consumer Group
A consumer group is a logical entity in Kafka ecosystem that mainly provides parallel processing/scalable message consumption to consumer clients
Each consumer instance must be associated with some consumer group
Make sure that no duplication among consumers who all are part of the same consumer group
The ideal is to have the same number of consumers in a group that we have as partitions in a topic, in this way, every consumer read from only one.
When adding consumers to a group, you need to be careful, if the number of consumers is greater than the number of partitions, some consumers will not read from any topic and will stay idle.

https://docs.cloudera.com/cdp-private-cloud-base/latest/kafka-developing-applications/topics/kafka-develop-groups-fetching.html

Consumer Group Rebalancing

Process of redistribution of partitions to the consumers within a consumer group .
Rebalancing of a consumer group happens in below cases:
A consumer is joining the consumer group
A consumer is leaving the consumer group
If the partitions are added to the topic's which these consumers are intrested in
If partitions goes offline state

Group Coordinator

Brokers in the Kafka cluster are assigned as group co-ordinator for a subset of consumer groups
The group coordinator maintains a list of consumer groups
The group coordinator initiates a rebalance process call
The Group Coordinator communicates the new assignment of partitions to all consumers.
Until the rebalancing process is not finished, consumers within the consumer group (whose balance is happening), will be blocked from reading any messages

Group Leader

The first consumer to join the consumer group is elected as the Group Leader
The Group Leader has the list of Active members and the selected assignment strategy
Group Leader sends the new assignment of partitions to the Group Coordinator
It executes the rebalance process

What Happens When a Consumer Joins A consumer Group:

When a consumer starts, it sends find coordinator requests to obtain the group coordinator which is responsible for its Group
The consumer initiates the rebalance protocol by sending a join Group request
Next, All the members of that consumer group send a sync group request to the coordinator.
Each consumer periodically sends a heartbeat to the group coordinator to keep the session alive

Zookeeper:

How Kafka and ZooKeeper Work Together

#1 Controller Election.

Zookeeeper is used by Kafka to coordinate clusters, maintain metadata, track broker availability ,leader selection, manage consumer groups and as well as dynamic configuration changes
To choose the Controller, each broker attempts to create an "ephemeral node" (a znode that will exist until the session that created it ends) called controller.
Maintain the leader/follower relationship among all the partitions of topics
If a node went down ,Zookeeper ensures that other replicas will take up the leader

领英推荐

Introduction to Apache Kafka

Brij kishore Pandey 9 个月前

What makes Kafka so fast and efficient?

Vivek Bansal 10 个月前

Server-Sent Events Using Spring WebFlux and Reactive…

Egen 1 年前

#2 Cluster Membership

A group of brokers will have a unique identification ID through out the Kafka cluster
A group Znode is generated when the broker connects to the respective zookeeper instances and each broker then constructs ephemeral zone inside of this group znode.
Maintain the functioning brokers in the cluster

#3 Topic Configuration.

Each kafka topic has its own configuration.
The replication factor, limit message size ,unclean leader election, flush rate and message retention all are determined by these settings.
Maintains the Location of relicas ,No of partition of each topic

#4 Access Control Lists (ACLs).

Apache Kafka includes an authorizer that supports Access Control List via Zookeeper
Maintains the ACLS for all the topics ,will include who is allowed to write/read from each topic ,list of the consumer group, members of the group, most recent offset each consumer group received from the partition

#5 Quotas. ?

ZooKeeper accesses how much data each client is allowed to read/write.
The controller is responsible for managing the state of the Kafka cluster, including broker registration, leader election for partitions, and handling broker failures2

Apache Kafka Examples

Website activity tracking
Web Shop(Ecommerence)
Kafka as a Message queue

Kafka best practices:

Understand the data rate of your partitions to ensure you have the correct retention space.
Choose an appropriate number of partitions:
Use key-based partitioning when necessary
Consider data skew and load balancing
Plan for scalability
Error Handling: when errors are non-retriable such as RecordTooLargeException, TopicAuthorizationException, producer will return the error to application, e.g. via callback
Leave topic compression.type to its default value producer, meaning retain the original compression codec set by the producer.Specify compression.type in the producer
Idempotence:To enable idempotent producer, set enable.idempotence to true.
Optimise Batch Size batch.size and linger.ms
Select a meaningful message key. Keys determine the partition to which a message is sent, which can be important for maintaining message order within a partition.
Set an appropriate replication factor
Ensure that your Kafka producer’s version is compatible with your Kafka broker version to avoid compatibility issues
Avoid frequent partition changes
Monitor and tune as needed batch-size-avg ,records-per-request-avg, record-queue-time-avg ,record-send-rate, record-size-avg, compression-rate-avg, request-rate,requests-in-flight,request-latency-avg
If your consumers are running versions of Kafka older than 0.10, upgrade them.
Tune your consumer socket buffers for high-speed ingest.(socket.receive.buffer.bytes)
Design high-throughput consumers to implement back-pressure when warranted
When running consumers on a JVM, be wary of the impact that garbage collection can have on your consumers.(Off heap memory will came in to the picture to handle this situation)

in Kafka, auto commit refers to the automatic saving of consumer offsets, which helps in tracking the progress of messages that have been read from a topic. The consumer's offset indicates the next message to be read from a partition.

Here’s a detailed breakdown of how auto commit works:

1. Offset in Kafka

An offset is a numerical value that represents the position of a message within a partition of a Kafka topic.
Kafka consumers read messages from partitions, and the offset tracks where the consumer left off. This is important for ensuring no message is missed or re-read unnecessarily.

2. What is Auto Commit?

Auto commit is a feature that allows the Kafka consumer to commit its current offset automatically after a certain interval of time, without explicit action from the consumer code.
This is controlled by the configuration setting enable.auto.commit. When set to true, the consumer will automatically commit the offset after reading messages.

3. How Auto Commit Works

If enable.auto.commit is set to true (default setting), the consumer will commit the current offset of the messages it has processed after a regular interval. This interval is controlled by the auto.commit.interval.ms setting (default: 5000ms or 5 seconds).
Example: If a consumer reads messages from partition 0 and the last message it read has offset 100, Kafka will store that offset so the consumer knows to start reading from offset 101 next time.

4. Advantages of Auto Commit

Ease of Use: You don’t have to manage offsets manually, which simplifies the code for less complex use cases.
Simplicity: Useful when you want to consume messages and you are not concerned about fine control of message acknowledgment.

5. Risks of Using Auto Commit

Potential Data Loss or Duplication: Auto commit might commit the offset before the message is fully processed. If a consumer crashes after committing the offset but before processing the message, that message will be lost.Conversely, if the consumer crashes before committing the offset but after processing the message, it could re-read and process the same message after restart, leading to duplicate processing.
No control over when to commit: Sometimes, you might need to commit the offset only after successfully processing a message (for example, after writing the processed data to a database). Auto commit lacks this level of control.

6. Manual Offset Management as an Alternative

Instead of auto commit, you can turn off this feature by setting enable.auto.commit = false and manually commit offsets at appropriate points in your code.
With manual commit, you have two methods:commitSync(): This blocks the consumer until the offset is committed and ensures that the commit was successful.commitAsync(): This does not block and is non-blocking but might risk losing the commit if the consumer crashes before the callback completes.

7. When to Use Auto Commit

Auto commit is ideal for simple consumers where message loss or duplication is not critical, and the consumer's logic is stateless or idempotent.
For stateful or more complex processing, or where at-least-once or exactly-once processing is needed, manual offset control should be preferred.

8. Configuration Settings

enable.auto.commit = true: Enables auto commit.
auto.commit.interval.ms = <milliseconds>: Sets the frequency with which offsets are committed automatically. The default is 5000ms (5 seconds).

Example Scenario

If a Kafka consumer is processing records and takes 7 seconds to process a batch of messages:

With auto commit, if auto.commit.interval.ms is 5 seconds, Kafka will commit the offsets after 5 seconds, possibly before the consumer finishes processing. If the consumer crashes in the 6th second, it may lose messages.

### Low Offset and High Offset in Kafka

In Kafka, offsets are important to understand how consumers read data from a topic. Each message in a Kafka partition is assigned an offset, which is a unique identifier for that message in that partition.

1. Low Offset:

- The low offset represents the earliest offset available in a partition. This might be the earliest message still retained by Kafka for that partition, depending on the retention policy.

- For example, if you have a Kafka retention policy to store messages for 7 days, the low offset is the first message that’s available after that retention window.

2. High Offset:

- The high offset is the next offset that will be assigned to a new message in a partition. It doesn't represent an actual message but the number following the last message's offset.

- For example, if the highest message offset in a partition is 100, the high offset will be 101, the next message offset.

### Late Data Arrival in Kafka

Late data arrival refers to the scenario where data arrives late in a Kafka stream after the expected window for processing, causing potential issues in stream processing systems.

#### Reasons for Late Data Arrival:

1. Network Latency: Delays in the network between the producer and broker.

2. Producer Delays: Producers may experience delays due to backpressure, processing issues, or resource limitations.

3. Backlogs in Kafka: Kafka topics may have backlogs due to slow consumers or spikes in message production that consumers can’t keep up with.

4. Consumer Issues: Consumers may experience lags due to processing bottlenecks or high message processing times.

5. Data Skew: Some partitions might receive more data than others, causing uneven load distribution and delays in processing.

### Handling Late Data in Kafka

1. Using Windowing and Watermarks (for Streaming Systems like Kafka Streams or Flink):

- Windowing allows for defining time-based windows (like 5-minute windows) to process events. If an event arrives late, it can still be included if the window is still open.

- Watermarks: A watermark is a threshold that helps streaming systems to decide when to close a window. Events with timestamps less than the watermark are considered late. Systems can process these late events based on the configuration.

- If late-arriving events are frequent, the watermark can be relaxed, allowing more time for late data before finalizing a window.

2. Retention Period:

- Increase the retention period for the topic to retain old data long enough so that late-arriving data can still be processed. This can prevent the loss of late messages due to retention policies.

- Kafka's configuration: log.retention.hours or log.retention.bytes controls how long data is stored.

3. Grace Period:

- In stream processing frameworks, a grace period can be used to accept late-arriving data within a certain threshold of the original event time.

- For example, if you process data in 10-minute windows but expect occasional late data, you can define a grace period of 5 minutes to allow that late data to be processed within the window.

4. DLQs (Dead Letter Queues):

- If data consistently arrives late or is unprocessable, it can be sent to a Dead Letter Queue (DLQ) for later examination. This allows the main stream to continue without being bogged down by problematic data.

### Handling Stuck Queues (Backpressure in Kafka)

Stuck queues or backlogs in Kafka occur when consumers cannot keep up with the rate of data produced to a topic. This can lead to data being stuck in the queue, causing a delay in processing.

#### Reasons for Backpressure:

1. Slow Consumers: Consumers might not be able to process messages fast enough due to slow message processing or insufficient resources.

2. Message Rate Spikes: A sudden surge in the number of messages produced can overwhelm consumers.

3. Uneven Partition Distribution: If some partitions receive more data than others, consumers assigned to those partitions might lag behind.

#### Ways to Handle Backpressure:

1. Increase Consumer Parallelism:

- Add more consumers to your consumer group. Kafka partitions the topic and assigns different partitions to consumers. By adding more consumers, you can divide the load and improve throughput.

- Use multiple instances of consumers to increase parallelism.

2. Optimize Consumer Processing:

- Reduce the time it takes for a consumer to process each message. This can be done by improving the code logic, using faster I/O operations, or optimizing resource usage.

- Batch processing: Consumers can read multiple messages at once (in batches) instead of processing them one by one.

- Adjust Kafka's fetch.min.bytes and fetch.max.wait.ms settings to ensure consumers are fetching larger batches, reducing the number of requests.

3. Tuning Kafka Settings:

- Adjust producer and consumer configurations to handle spikes better.

- For the producer, increase the linger.ms value to accumulate more messages before sending a batch, which can reduce pressure on Kafka brokers.

- For the consumer, ensure that max.poll.records is set high enough to process larger message batches at a time.

4. Scaling Kafka Brokers:

- If the backlog is due to Kafka brokers being unable to handle the producer's load, you can scale Kafka by adding more brokers, thereby distributing the load across a larger cluster.

5. Increase Partitioning:

- Increasing the number of partitions in a Kafka topic can help distribute the load more evenly across consumers, enabling faster processing.

6. Rebalance Consumers:

- If consumers are unevenly distributed across partitions (for example, one consumer handles more data than another), you can trigger a rebalance to redistribute partitions more evenly among consumers in the group.

### Handling Late Delays and Slow Producers

#### Reasons for Producer Delays:

1. Network Issues: Slow network connectivity between the producer and Kafka brokers.

2. Producer Backpressure: If a producer is overwhelmed by the rate of data generation or facing resource constraints (CPU, memory), it may delay producing messages to Kafka.

3. Broker Overload: If Kafka brokers are overloaded, they may slow down acknowledgment responses, causing producers to experience delay.

#### Solutions:

1. Increase Broker Capacity: Add more brokers to the Kafka cluster or increase broker capacity (e.g., CPU, memory, disk I/O) to handle higher traffic.

2. Tune Producer Settings:

- Batch Size: Increase the batch size (`batch.size`) to reduce the number of producer requests.

- Retries: Increase the retry count (`retries`) in the producer to handle intermittent broker unavailability or timeouts.

- Acks: Tune the producer acknowledgment (`acks`) setting to 1 or 0 if you can tolerate some data loss, which reduces the time spent waiting for broker responses.

### Conclusion

Handling late data and backpressure in Kafka involves fine-tuning consumer/producer settings, improving the scalability of the Kafka cluster, and using advanced features like windowing, grace periods, and dead-letter queues to ensure reliable and timely data processing. Balancing these factors ensures that Kafka streams run smoothly, even under varying data loads and unexpected latencies.

Will write further

Refer:

https://docs.cloudera.com/cdp-private-cloud-base/latest/kafka-developing-applications/topics/kafka-develop-rebalance.html

Rahul Suryavanshi

6 个月

Great Article, thanks for sharing this in-depth knowledge about Kafka.

Pradeep Sekar

Junior Developer Advocate @ Streambased

7 个月

Hey Arabinda Mohapatra great article on explaining about the kafka ecosystem being a kafka newbie myself it was very useful. I want to add something to this article and i would like to know your opinion. With the introduction of KIP-405, Kafka now supports unlimited tier storage, allowing it to handle ingestion, storage, and processing all within a single platform. This update means Kafka can serve as both your streaming and analytics solution, effectively simplifying the traditional data engineering workflow by eliminating the need for separate data storage, processing, and integration layers. Streambased takes this a step further by enabling you to perform batch analytics using SQL directly from Kafka. With seamless JDBC driver connectivity, Streambased allows you to integrate your data directly with your favorite BI tools. This not only reduces complexity but also accelerates your data pipeline, making real-time insights more accessible than ever.

1 次回应

查看更多评论

要查看或添加评论，请登录

Arabinda Mohapatra的更多文章

A Deep Dive into Caching Strategies in Snowflake

2025年3月22日

A Deep Dive into Caching Strategies in Snowflake

What is Caching? Caching is a technique used to store the results of previously executed queries or frequently accessed…
A Deep Dive into Snowflake External Tables: AUTO_REFRESH and PATTERN Explained

2025年3月16日

A Deep Dive into Snowflake External Tables: AUTO_REFRESH and PATTERN Explained

An external table is a Snowflake feature that allows you to query data stored in an external stage as if the data were…
Apache Iceberg

2025年3月16日

Apache Iceberg

Apache Iceberg Apache Iceberg is an open-source table format designed to handle large-scale analytic datasets…
Deep Dive into Snowflake: Analyzing Storage and Credit Consumption

2025年2月24日

Deep Dive into Snowflake: Analyzing Storage and Credit Consumption

1. Table Storage Metrics select TABLE_SCHEMA,TABLE_CATALOG AS"DB",TABLE_SCHEMA, TABLE_NAME,sum(ACTIVE_BYTES) +…

1 条评论
Continuous Data Ingestion Using Snowpipe in Snowflake for Amazon S3

2025年2月23日

Continuous Data Ingestion Using Snowpipe in Snowflake for Amazon S3

USE WAREHOUSE LRN; USE DATABASE LRN_DB; USE SCHEMA LEARNING; ---Create a Table in snowflake as per the source data…

1 条评论
Data Loading with Snowflake's COPY INTO Command-Table

2025年2月18日

Data Loading with Snowflake's COPY INTO Command-Table

Snowflake's COPY INTO command is a powerful tool for data professionals, streamlining the process of loading data from…
SNOW-SQL in SNOWFLAKE

2025年2月17日

SNOW-SQL in SNOWFLAKE

SnowSQL is a command-line tool designed by Snowflake to interact with Snowflake databases. It allows users to execute…
Stages in Snowflake

2025年2月9日

Stages in Snowflake

Stages in Snowflake play a crucial role in data loading and unloading processes. They serve as intermediary storage…
Snowflake Tips

2025年2月8日

Snowflake Tips

??Tip 1: Use the USE statement to switch between warehouses Instead of specifying the warehouse name in every query…
SnowFlake

2025年2月8日

SnowFlake

??What is a Virtual Warehouse in Snowflake? ??A Virtual Warehouse in Snowflake is a cluster of compute resources that…

See all articles

Kafka Eco System

Arabinda Mohapatra

Pyspark, SnowFlake,AWS, Stored Procedure, Hadoop,Python,SQL,Airflow,Kakfa,IceBerg,DeltaLake,HIVE,BFSI,Telecom

How Kafka and ZooKeeper Work Together

领英推荐

1. Offset in Kafka

2. What is Auto Commit?

3. How Auto Commit Works

4. Advantages of Auto Commit

5. Risks of Using Auto Commit

6. Manual Offset Management as an Alternative

7. When to Use Auto Commit

8. Configuration Settings

Example Scenario

Arabinda Mohapatra的更多文章

社区洞察

其他会员也浏览了

Kafka Basics

Kafka Simplified

002 – March 2023

Kafka with KRaft (Kafka Raft)

Kafka Acks Explained

Addressing Kafka Partition Imbalance: Strategies for Ensuring Even Distribution Across Brokers

LinkedIn Handle 7 Trillion Messages Daily With Apache Kafka

Kafka clusters: real-life challenges and how to avoid them

Top 10 operational challenges in managing Kafka

Advanced Concepts in Apache Kafka

How Kafka and ZooKeeper Work Together

领英推荐

1. Offset in Kafka

2. What is Auto Commit?

3. How Auto Commit Works

4. Advantages of Auto Commit

5. Risks of Using Auto Commit

6. Manual Offset Management as an Alternative

7. When to Use Auto Commit

8. Configuration Settings

Example Scenario

Arabinda Mohapatra的更多文章

A Deep Dive into Caching Strategies in Snowflake

A Deep Dive into Snowflake External Tables: AUTO_REFRESH and PATTERN Explained

Apache Iceberg

Deep Dive into Snowflake: Analyzing Storage and Credit Consumption

Continuous Data Ingestion Using Snowpipe in Snowflake for Amazon S3

Data Loading with Snowflake's COPY INTO Command-Table

SNOW-SQL in SNOWFLAKE

Stages in Snowflake

Snowflake Tips

SnowFlake

社区洞察

其他会员也浏览了

Kafka Basics

Kafka Simplified

002 – March 2023

Kafka with KRaft (Kafka Raft)

Kafka Acks Explained

Addressing Kafka Partition Imbalance: Strategies for Ensuring Even Distribution Across Brokers

LinkedIn Handle 7 Trillion Messages Daily With Apache Kafka

Kafka clusters: real-life challenges and how to avoid them

Top 10 operational challenges in managing Kafka

Advanced Concepts in Apache Kafka