Exploring the Heart of Real-Time Messaging at Slack ??
Ankit Shaw
Engineering @ Nomura | Ex - Morgan Stanley | Java | Python | Distributed Systems
In this article we will explore and understand Slack's architecture that is used to send real-time messages at scale. We'll take a look at the services that deliver chat messages and various other events sent to online users in real time at Slack.
The core services in the system are written in Java, including:
Channel Server
Slack uses Channel Servers (CS) to hold message history of channels and each of such CS are mapped to a subset of channels using consistent hashing.
During peak times, they handle about 16 million channels per host, ensuring that messages reach their intended destinations without delay. Every CS host is responsible for receiving and sending messages for the channels they are mapped to, making the entire system highly efficient.
Gateway Server
Gateway Servers are stateful and in-memory, and serve as the interface between Slack clients and CSs. They are deployed across multiple geographical regions to ensure fast connections, and a draining mechanism handles region failures, which seamlessly switches users to a healthy region.
Admin Server
Admin Server (AS) are stateless and in-memory, facilitating communication between the Webapp backend and CSs.
Presence Servers
Presence Server (PS) are in-memory, track online user status, powering those familiar green presence dots in Slack clients. Users make queries to PS through websockets using GS as a proxy, ensuring that presence notifications are delivered only for the users currently visible on the app screen.
Slack Client
Slack Client establishes a persistent websocket connection to Slack's servers to receive real-time events and maintain the client's state.
领英推荐
Sending a message to clients in real-time
Once the client is ready, each message sent in a channel is broadcasted to all clients online in the channel. Let us see how the flow of message happens.
Events
Events are special messages, real-time updates that affect the client's state. These events undergo a similar journey as chat messages, keeping the Slack experience dynamic and interactive. Some example of events are, when a user sends a reaction to a message, a bookmark is added, or a member joins a channel.
There are Transient events as well. These events are not persisted in the database. Example of such a events is user typing in a channel.
Conclusion
Slack's real-time messaging system efficiently handles tens of millions of channels per host and serves a similar number of connected clients. This system ensures messages are delivered globally within a mere 500 milliseconds. Moreover, the current infrastructure is designed for linear scalability, meaning it can easily accommodate even more customers in the future.
#scalability #systemdesign #softwareengineering
Software Engineer ?? | Crafting future-focused Solutions @MorganStanley | Expert in FinTech, Java, Python, Go, and Typescript | Mitacs Research Scholar | System Design Maven | Tech Blogger & Engaging Speaker
1 年Great article Ankit Shaw!! Thank you for sharing!!