Can We Rely on #Microservice #Event #Polling, is Polling #Robust?
It is is not that simple as it seems...

Can We Rely on #Microservice #Event #Polling, is Polling #Robust?

#Series @#Architect for #Developers@


An event-driven solutions (EDS) are known for years from the implementation for objects. One object fires an event – creates an event-object – while other objects listen to this event-object appearance. All in one application, all on the same code-base. For similar communication between application, messaging pub/sub or Observer pattern was used. One of the fundamental differences between EDS and Observer pattern was that the latter needed an intermediary – a messaging infrastructure – between the sender of the message and its receiver. An implementation of events and messaging solutions were based on the push model, which assured, with certain additional persistence on the fly, delivery of the objects/messages to the particular receiver. Messaging could loose-couple the sender and the receiver because the messaging infrastructure provided identification of the receiver that the sender might not know about it. However, the receiver had to know which messaging queue/topic to sign to, i.e. had to know something about the sender – what messages to expect.

With Microservices, the picture is a bit different. Each Microservice is a uni-functional application (can we call Microservice and multi-functional application as well?).For inter-communication, Microservices can use events and messages, but, in both cases, there has to be an infrastructure – a messaging system or an event bus. This infrastructure allows asynchronous communication between Microservices though still cannot really decouple sender and receiver. Let’s review this case.

Assume that the sender and receiver know nothing about each other. The first question – why they communication – none of them need it? Assume that the sender fires a generic event and the list of such events is known to several Microservices. When the receiving Microservice, signed for this event, gets it, it does not know who sent it, i.e. it is not about communication between Microservices. Assume that the generic event is not about sending Microservice but about a change of a state of commonly known asset/resource. This means that all Microservices interested in the particular resource are coupled by design via this resource. The fact that Microservice can act upon an even makes it a #noSOA element (in the SOA, only one model is possible – consumer-provider, i.e. the provider never acts until it is called by the consumer). Described case is also based on the push model realised via explicit subscription of the event listener that provides its API for the event notification.

There are at least two pitfalls of the push model. First, the receiving Microservice can get the event notification and crash. Since there is not an acknowledgement mechanism in place, the sending Microservice will not know it might need to re-fire the even and the entire information path fails with no recovery. Second, the receiving Microservice may be unreachable on the network – it may be down at the moment of push from the event bus or related network segment may be broken at that moment. So, we need such option as a durable subscription like we had in old messaging solutions, but I have not heard that the durable subscription is a norm for the Microservice event. The undelivered event can timeout and erased or end-up in the ‘dead event letter’.

In order to avoid the latter pitfall, an Event Polling Pattern, also known as “Polling Consumer” is used. The receiving Microservice does not subscribe for notification about events, but actively calls the event holder and retrieves the event or a message. For example, the Kafka messaging system works this way. Still, there has to be a preliminary association between the sending Microservice from one side and the Kafka’s partition in certain Kafka’s Topic from another side. Similarly, the association is needed between the Kafka’s partition and the polling Microservice. This association is the a matter of design, which, again, couples Microservices on the both sides of Kafka.

There are several constraints exist for polling events. Particularly,

·        A polling Microservice has to poll periodically since it is unknown when the event is fired, which increases the consumption of the network bandwidth

·        There is a dependency between an event frequency and a polling frequency. If the latter is smaller than the former, there is always a risk that certain events could be a timeout in the event bus and lost.

·        Event polling works in relatively simple cases: a polling Microservice gets events from only one firing Microservice. If several Microservices fire events, there is no mechanism to scale the polling Microservice, which process events sequentially. If a Microservice need more than one event to start processing, there is a possible risk of desynchronisation of ‘independent’ events resulting in a potentially wrong outcome

·        If the firing Microservice and its behaviour is owned/managed by one team while the polling Microservice belongs to another team and they both work on the same application, we have a risk of uncertainty – the poling Microservice may temporarily accumulate events and process than in a batch; it may be not well predictable when the batch is complete

·        If polling Microservice crashes, there is no recovery or fail over means available and the entire information path fails. Since events in this model are usually timeout after a while, the events may be lost.

In one of previous articles, we discussed so called Sib Microservice Pattern. In essence, this pattern helps for a Microservice failure recovery: if a Microservice A invokes a Microservice B and the latter fails, the A can invoke either a sibling or a tween Microservice instead. This is a classic SOA pattern, which does not work with polling Microservices. When a Microservices becomes active on its own scheduler (while SOA Service never does), its failure is undetectable other than via an off-line log analysis or via “heart-beat” control conducted by another Microservice or monitor.

More accurately, an activation and re-activation of poling Microservice depends on its implementation. If the scheduler embedded in the Microservice, it is useless for recovery. If it is an external scheduler that activates a new Microservice instance for each scheduled round, this increases complexity and brings up two new risks. First, it is necessary to make the scheduler fault tolerant (for each polling Microservices that may have different rounds). Second, a new dependency between the frequency of event generation and event polling appears because multiple instances of the polling Microservice can start competing among themselves for the next event in the event bus. If the processing events is significantly different, this can result in a mess of the processing order and can break the event order logic. If the scheduler triggers the polling Microservice once, even an external scheduler cannot help in recovery of failed Microservice.

If an application comprises just firing and polling Microservices, we are able to catch the failure relatively easy. If polling Microservices fail in a distributed environment of a bigger application, and this might happen in several place, it is the problem.

Finally, let’s see how the Event Polling Pattern can work in distributed transaction. Despite of many recommendations from Microservice gurus, business tasks are set in the way that require certain undisturbed ordered execution of particular tasks, i.e. business transactions. We can run this transaction via direct inter-Microservice invocations or via event subscription (pub/sub) pattern. The Sib Microservice Pattern guarantees robustness in both cases (if the event bus is designed in a such way it can invoke sibling or tween Microservices if the original ones fail). We can use a Saga Pattern as well. Can we use a polling technique here? My answer is ‘no’ - it is not robust, and we cannot recover the transaction implemented in one of the aforementioned ways easily.

Well, assume the direct transaction has reached the final Microservice but then failed in one of the local transactions and cannot be committed in full. We need to run the compensation transaction. Again, if use direct invocation of event subscription with the backup from the Sib Microservice Pattern, we are fine. (I have to outline that all such business transactions are usually known up-front and appropriate event subscriptions can be arranged a priori). However, a polling technique is not suitable here as well because it is not easily recoverable in a case of failure.

It seems, in order to rely on polling in the solutions where Microservice may fail by definition (such as “Microservice Architecture”), we need to create an additional entity that would quickly find a failure of polling Microservice and re-instantiate it. This has to be done for each polling Microservices, which significantly increases complexity, though gives a hope that the new instance would not crash, this is only what is left for us. Therefore, the Event Polling Pattern is effective in a “sunny day”, but unreliable and barely reparable at run-time in a “rainy day”.

 

Amlendu Kumar

Architect ? Digital Transformation ? Microservices ? SOA ? Methodologist ? Practitioner

6 年

great insight, very strategic

要查看或添加评论,请登录

Michael Poulin的更多文章

社区洞察

其他会员也浏览了