Business Process Expectations and Messaging Systems like Kafka
licensed from Shutterstock

Business Process Expectations and Messaging Systems like Kafka

Messaging systems like Kafka are used to distribute messages and data streams in all sorts of applications, mostly in cloud-native applications that have been composed using microservices. It is an essential connector between microservices that produce and consume data, doing this at scale and with exceptionally good reliability. Messaging systems can be installed in your data center or consumed as a service from the public cloud providers such as AWS.

Understand the underlying business processes

Several technical decisions need to be made while configuring these systems, based on a deep technical understanding of the solution, the operating system and the infrastructure being used. However, the first step is to understand the business processes that are implemented by the microservices that produce and consume the messages and data streams. In some cases, like transaction processing applications, every message is important and must be delivered exactly once. The order in which the messages are processed may also be important. For example, when updating a bank account, deposits and withdrawals need to be handled in the correct order, otherwise an out-of-balance condition could occur, and the customer charged for an overdraft which is not her fault. In other cases, such as telemetry, the latest data is important even if some data was lost in between. When performing trades in the stock market it is important to use the latest price, and not bother to catch up on previous data that might have been missed. The business need being met has to be understood first, before the technical design can be finalized.

Since Kafka and other systems store the messages by default, data retention rules need to be followed and information security considered. For example, an e-commerce site that avoids storing personal and account information by using external services for payments needs to ensure that none of the information is persisted somewhere in the data stream. The storage used by Kafka must be extremely fast to quickly accept and distribute the data, but it need not be retained on fast storage after it has been consumed. Especially in a cloud-based implementation, applying retention rules and archiving unneeded data will save on running costs. Once again, the business requirement has to guide the design and help control the cost.

Monitoring and Response

Hardware and software components will fail at some point and the architecture of the system must be designed to minimize the business impacts of these failures. Monitoring the operations of the entire system is particularly important. This needs to include the producers, consumers, and various components of the messaging system. If the producers are generating data too fast for the system to ingest, or the consumers are picking up data too slowly, action may need to be taken. In addition, the status of the failover between brokers and partitions has to be monitored.

Monitoring the business processes for business data and transaction integrity is very important. Orders, payments, or business transactions received must be reconciled with what was recorded and payments requested and received. These need to include external systems and APIs used, or even third parties to ensure that data loss, corruption or duplication has not occurred. In addition to the technical team that investigates and fixes technical errors, there must be a business operations team that investigates and fixes business process anomalies. All these teams must follow SRE principles, gaming for the possible errors, how they would be detected, and the automated scripts needed to fix them. This needs to support the need for ‘undo’ in case erroneous data is received from other applications, or even ‘wait’ cycles in case those systems have failed and need time to be fixed.

Service Levels

Service Level Objectives (SLOs) need to be defined and measured for business performance, availability, data integrity and security of all applications that are critical to the business. The underlying measures that ?lead to the SLOs must be part of the operations dashboard that the business and technical operations teams are looking at so that any deviation can be fixed quickly. This needs to be combined with the technical indicators to help quickly correlate the business process errors with technical events.

This is an example of the way that in addition to technology, applications have to be built considering the business, processes and metrics in order to successful.

About Tailwinds

At Tailwinds we are helping teams design, build, deploy and operate cloud-native applications securely with lower cost and faster time to market using our Internal Developer Platform (IDP) product - MajorDomo.


#SLO?#errorbudget?#cloudnative?#sre?#platformengineering?#internaldeveloperplatform?#itbm?#itsm?#itom


要查看或添加评论,请登录

Animesh Mukherjee的更多文章

  • Use FinOps to optimize Cloud Ops

    Use FinOps to optimize Cloud Ops

    Introduction The business and finance functions in companies have been managing the investments and costs of IT for…

    5 条评论
  • Right-sizing in the Cloud

    Right-sizing in the Cloud

    I have recently been helping clients understand and control their cloud costs and finding some very easy ways to save…

    4 条评论
  • Choosing On-Prem vs. Public Cloud

    Choosing On-Prem vs. Public Cloud

    There was an article recently on Linked In by David Heinemeier Hansson about his company’s decision to leave the public…

    6 条评论
  • Be Prepared

    Be Prepared

    This used to be the slogan of the Boy Scouts, but it applies to all of us for many aspects of life. As I deal with the…

    2 条评论
  • Uptime Percentages, Recovery Time Objective and Error Budgets

    Uptime Percentages, Recovery Time Objective and Error Budgets

    It is very common to talk about the number of ‘nines’ that an application is expected to be up and running, a short way…

    1 条评论
  • How Platform Engineering Helps Meet C-Suite Expectations

    How Platform Engineering Helps Meet C-Suite Expectations

    Digital transformation and application modernization increasingly means building a cloud-native application hosted by a…

    4 条评论
  • Why are empty roads lit up at night?

    Why are empty roads lit up at night?

    While approaching New Delhi on a flight at 2am a week ago, I noticed that while most of the land, houses etc. were…

    2 条评论
  • People are the Most Important in the People, Process, Technology Triad

    People are the Most Important in the People, Process, Technology Triad

    “Your bag has arrived, I had it delivered to your room!”, she shouted as she watched me approach her concierge desk…

    2 条评论
  • Stoking Creativity

    Stoking Creativity

    Betty’s husband always used the ‘yes, but’ to respond in conversations. It was so bad that she and her friends used to…

  • Opening Up – to new ideas …

    Opening Up – to new ideas …

    All over the US, and especially here in California, restrictions imposed due to COVID are being eased and businesses…

社区洞察

其他会员也浏览了