Kafka Producer Acks
Rob Golder
Director & Co Founder at Lydtech Consulting - Consultant Lead Engineer & Architect
Introduction
Kafka Producers can be configured to determine how many replicas must acknowledge the write of a message to a topic partition before the message is considered successfully written.
It is important to understand the behaviour of this parameter and the trade-offs being made when configuring this setting, as it impacts durability and performance.
Acks Configuration
The Producer can be configured to wait for?0,?1, or?all?replicas to acknowledge a message write using the acks configuration parameter.
Configuring as?0?means the write is simply a fire and forget, as the Producer does not await any acknowledgement, and does not know whether the write succeeded or failed.
Configuring as?1?means that the Producer will await the one lead topic partition replica to acknowledge the write.
Configuring as?all?ensures that the Producer only receives acknowledgement of a successful message write once all the current in-sync replicas have received the message. The partition itself will only accept writes if there are at least the minimum required number of replicas in-sync, as configured by the?min.insync.replicas?setting. If there are insufficient replicas available then an error will be thrown, and the Producer can retry the write if configured to do so.
This then results in a trade-off between performance and durability. Requiring fewer replicas to acknowledge the message leads to improved performance at the expense of the risk of message loss during failure scenarios.
Acks Behaviour
For the purposes of demonstrating the differences in behaviour based on the configuration of the Producer?acks?property, consider the flow where a message is consumed by a service and a resulting outbound event is published by the Producer. When the outbound message is successfully published then the original message that triggered the flow is marked as consumed by updating the consumer offsets.
Figure 1: consumer, produce & update offsets
When Producer?acks?is configured as?0, then no acknowledgement of a successful write is sought by the Producer. This will be the most performant option, but with the highest risk of message loss. If the lead topic partition dies before the message is replicated the message will still be considered successfully sent, so the consumer offsets are updated marking the original message as consumed, despite the outbound event being lost.
Figure 2: acks = 0, resulting in message loss
If?acks?is configured to?1, then only the partition lead replica need acknowledge the message for the write to be considered successful. This greatly reduces the chance of message loss for a minimal performance cost. The risk is that the replica could die before the message has been replicated to other broker nodes. The Producer would not then re-publish the message, resulting in the message being lost.
Figure 3: acks = 1, resulting in message loss
With the same failure scenario as above, but with?acks?configured as?all, then the lead replica will not have acknowledged receipt of the message from the Producer as it must first replicate the message across the in-sync replicas before it does so. As the attempt to write the message times out the Producer is able to continue as required, be it to retry the publish or fail the message processing. Typically the retry will happen automatically by the Kafka client library unless configured not to do so.
Figure 4: acks = all, resulting in message retry
With?acks?configured as?all, and the lead replica receives acknowledgment of receipt of the message from all its replicas, it is then able to send acknowledgement of receipt of the message back to the Producer. Even if the lead replica now dies the message will not be lost as it has been safely replicated.
Figure 5: acks = all, with successful replication
Configuring?acks?to?all?therefore provides the strongest available guarantee of avoiding message loss. It does come at a small performance cost, although this is usually considered insignificant for the majority of use cases.
Configuration Summary