登录查看更多内容

Distributed Message System

Karthick Chandrasekar

? Payments Product Leader | ?? Digital Payments & Collections | ?? Product Strategy & Growth ????

发布日期: 2019年6月13日

Open source software has become a fundamental building block for some of the biggest websites. And as those websites have grown, best practices and guiding principles around their architectures have emerged. This chapter seeks to cover some of the key issues to consider when designing large websites, as well as some of the building blocks used to achieve these goals.

When it comes to system architecture there are a few things to consider: what are the right pieces, how these pieces fit together, and what are the right tradeoffs. Investing in scaling before it is needed is generally not a smart business proposition; however, some forethought into the design can save substantial time and resources in the future.

The purpose of the distrubuted messaging is to provide a unified, high-throughput, and low-delay system platform for real-time data processing and it delivers the following three functions:

Publishing and Subscription: publishes subscription streaming data similar to other message systems.
Processing: Compiles a stream processing application and responds to real-time events.
Storage: Securely stores streaming data in a distributed and fault-tolerant cluster.

Let us understand more about the message system and the problems it solves. Take the currently popular microservice architecture as an example. Let's assume that there are three terminal-oriented (WeChat official account, mobile app, and browser) web services (HTTP protocols) at the web end, namely Web1, Web2, and Web3, and three internal application services App1, App2, and App3 (Remote Procedure Call, for example, WCF and gRPC). If there is no message system and the direct connected mode is adopted, the communication mode between them may be as follows.

When systems are simple, with minimal processing loads and small databases, writes can be predictably fast; however, in more complex systems writes can take an almost non-deterministically long time. For example, data may have to be written several places on different servers or indexes, or the system could just be under high load. In the cases where writes, or any task for that matter, may take a long time, achieving performance and availability requires building asynchrony into the system; a common way to do that is with queues.

Imagine a system where each client is requesting a task to be remotely serviced. Each of these clients sends their request to the server, where the server completes the tasks as quickly as possible and returns the results to their respective clients. In small systems where one server (or logical service) can service incoming clients just as fast as they come, this sort of situation should work just fine. However, when the server receives more requests than it can handle, then each client is forced to wait for the other clients' requests to complete before a response can be generated.

This kind of synchronous behavior can severely degrade client performance; the client is forced to wait, effectively performing zero work, until its request can be answered. Adding additional servers to address system load does not solve the problem either; even with effective load balancing in place it is extremely difficult to ensure the even and fair distribution of work required to maximize client performance. Further, if the server handling requests is unavailable, or fails, then the clients upstream will also fail. Solving this problem effectively requires abstraction between the client's request and the actual work performed to Service it.

Enter queues. A queue is as simple as it sounds: a task comes in, is added to the queue and then workers pick up the next task as they have the capacity to process it.These tasks could represent simple writes to a database, or something as complex as generating a thumbnail preview image for a document. When a client submits task requests to a queue they are no longer forced to wait for the results; instead they need only acknowledgement that the request was properly received. This acknowledgement can later serve as a reference for the results of the work when the client requires it.

Queues enable clients to work in an asynchronous manner, providing a strategic abstraction of a client's request and its response. On the other hand, in a synchronous system, there is no differentiation between request and reply, and they therefore cannot be managed separately. In an asynchronous system the client requests a task, the service responds with a message acknowledging the task was received, and then the client can periodically check the status of the task, only requesting the result once it has completed. While the client is waiting for an asynchronous request to be completed it is free to perform other work, even making asynchronous requests of other services. The latter is an example of how queues and messages are leveraged in distributed systems.

Queues also provide some protection from service outages and failures. For instance, it is quite easy to create a highly robust queue that can retry service requests that have failed due to transient server failures. It is more preferable to use a queue to enforce quality-of-service guarantees than to expose clients directly to intermittent service outages, requiring complicated and often-inconsistent client-side error handling.

After the introduction of the message system, all the issues mentioned previously get resolved.

Components, web services, and application services no longer need to be concerned about each other's interface definitions. Instead, they only need to be concerned about the data structure (JSON structure).
There isn't a need to worry about the structure of any open source messeging systems. It is mature, highly standard, and relatively stable.
It improves performance and is not only designed to transmit big data but also meet the requirements of most enterprises with its throughput.
It makes expansion easy through clustering. Moreover, it has a unique model that provisions common needs such as load balancing.

Producer/Consumer Model:

Producer is an application that produces messages at one end of a data pipeline. Consumer is an application that consumes messages at one end of a data pipeline.

Outlined below are the two scenarios when the producer sends messages to a queue:

If no consumer connects to the queue or consumes messages at this time, messages are saved in the queue until it is full or a consumer is online.
If multiple consumers connect to the queue at this time, one consumer receives only one message. Therefore, load balancing is naturally achieved in cases where there are multiple consumers in practice.

Publisher/Subscriber Model:

Publisher: an application that generates events at one end of a data pipeline.

Subscriber: an application that responds to events at one end of a data pipeline.

In the Publisher/Subscriber model, the data sent to a queue is in the form of events instead of messages. In this case, data processing is the subscription of an event, and not message consumption.

If no subscriber connects to the queue after the publisher publishes an event, the event gets lost, i.e., no application responds to it. If a subscriber is online later, he will not receive the event.

In case if multiple subscribers connect to the queue after the publisher publishes an event at the same time, the event gets broadcasted to all the subscribers, and each subscriber receives the same event. Therefore, load balancing does not exist

Stream Processing Application

There is a difference between batch processing applications and stream processing applications. A visible boundary determines the most significant difference between batch processing and stream processing. If it exists, it is called batch processing. For example, a client collects the data once every hour, sends this data to the server for statistics, and then saves the statistical results in the statistical database.

If the boundary doesn't exist, the processing is called streaming data (stream processing). Here is an example of stream processing: logs and orders are generated continuously on a large website just like a data flow. If the processing of each log and order takes less than several hundred milliseconds or several seconds after its generation, the application is called a stream application. If the collection of logs and orders happens once every hour followed by a unified transmission, the original stream data converts into batch data.

Apart from data boundaries, processing times can be used to differentiate stream processing and batch processing. The processing cycle for batch processing is generally hours or days, while the processing cycle for stream processing is usually seconds. Correspondingly, batch processing is referred to as offline data processing, whereas, stream processing is referred to as real-time data processing. In the unit of minutes, data processing is referred as near-line data processing. However, data processing is seldom discussed and generally processed offline unless the processing cycle slows down.

Designing efficient systems with fast access to lots of data is exciting, and there are lots of great tools that enable all kinds of new application

要查看或添加评论，请登录

Karthick Chandrasekar的更多文章

The Benefits of Web 3.0 and What it Means for the Crypto World

2022年7月24日

The Benefits of Web 3.0 and What it Means for the Crypto World

Get Your Basics Right as the World dives into Web 3.0 With each passing day, Web 3.
What Is The Spatial Web And How It Will Transform The Internet?

2022年7月10日

What Is The Spatial Web And How It Will Transform The Internet?

What Is The Spatial Web And How It Will Transform The Internet? In late 1980, Web 1.0 enabled humans to get a…
Artificial intelligence in the Retail Industry

2022年3月27日

Artificial intelligence in the Retail Industry

In the early days, people used to go to the retail stores, and all the things were done manually. Processes like…
Improvements of Cybersecurity by AI and its Applications

2021年6月19日

Improvements of Cybersecurity by AI and its Applications

Cybersecurity companies have witnessed significant growth amid the COVID-19 crisis. With the world going digital…

2 条评论
Payment Card Industry Compliance: An Overview

2021年3月10日

Payment Card Industry Compliance: An Overview

The payment card industry has seen significant growth over the past few years. According to a report, at the end of…
Merchant Payments Operations: Accounting and Reconciliation

2020年10月30日

Merchant Payments Operations: Accounting and Reconciliation

Merchant reconciliation is a dry and complex job. We will fail in our business if we didn't properly handle the…
COVID-19 Forcing B2B Digital Adoption

2020年7月25日

COVID-19 Forcing B2B Digital Adoption

Covid-19 pandemic is a watershed moment for B2B payments. B2B supplier payments – invoiced, accounts payable…
The Impact of COVID-19 on Insurance Markets

2020年7月10日

The Impact of COVID-19 on Insurance Markets

As communities around the world continue to grapple with the coronavirus pandemic, much remains uncertain for…
Subscription Payment

2020年5月19日

Subscription Payment

Customers from all walks of life look for ways to simplify their life by subscribing to services or goods that they…

1 条评论
The New B2B Sales Journey

2019年12月12日

The New B2B Sales Journey

How do we sell your complex solutions to increasingly demandingcustomers in order to meet rising organizational growth…

See all articles

Distributed Message System

Karthick Chandrasekar

? Payments Product Leader | ?? Digital Payments & Collections | ?? Product Strategy & Growth ????

Producer/Consumer Model:

Publisher/Subscriber Model:

Stream Processing Application

Karthick Chandrasekar的更多文章

社区洞察

其他会员也浏览了

.NET Core-based Microservices with Serverless capabilities

Serverless Architecture: The Next Big Thing in App Development

The Evolution of Serverless Architectures: AWS Lambda vs. Azure Functions

Discover the Top 6 Open-Source API Gateways for Scalable Solutions

Accelerate Application Modernization with Powerful Tools on Google Cloud Platform (GCP)

Top 10 Backend as a Service (BaaS) Companies Leading the Charge in 2025

The Case for Vendor Independence with Open Source: How to Build a Resilient Tech Stack

Scaling Node.js Applications: Strategies and Best Practices

Breaking Down The Benefits Of Full Stack Development

The Future of .NET: Exploring the Latest Features

Producer/Consumer Model:

Publisher/Subscriber Model:

Stream Processing Application

Karthick Chandrasekar的更多文章

The Benefits of Web 3.0 and What it Means for the Crypto World

What Is The Spatial Web And How It Will Transform The Internet?

Artificial intelligence in the Retail Industry

Improvements of Cybersecurity by AI and its Applications

Payment Card Industry Compliance: An Overview

Merchant Payments Operations: Accounting and Reconciliation

COVID-19 Forcing B2B Digital Adoption

The Impact of COVID-19 on Insurance Markets

Subscription Payment

The New B2B Sales Journey

社区洞察

其他会员也浏览了

.NET Core-based Microservices with Serverless capabilities

Serverless Architecture: The Next Big Thing in App Development

The Evolution of Serverless Architectures: AWS Lambda vs. Azure Functions

Discover the Top 6 Open-Source API Gateways for Scalable Solutions

Accelerate Application Modernization with Powerful Tools on Google Cloud Platform (GCP)

Top 10 Backend as a Service (BaaS) Companies Leading the Charge in 2025

The Case for Vendor Independence with Open Source: How to Build a Resilient Tech Stack

Scaling Node.js Applications: Strategies and Best Practices

Breaking Down The Benefits Of Full Stack Development

The Future of .NET: Exploring the Latest Features