Exploring the Heart of Real-Time Messaging at Slack ??

Exploring the Heart of Real-Time Messaging at Slack ??

In this article we will explore and understand Slack's architecture that is used to send real-time messages at scale. We'll take a look at the services that deliver chat messages and various other events sent to online users in real time at Slack.

The core services in the system are written in Java, including:

  1. Channel Servers
  2. Gateway Servers
  3. Admin Servers
  4. Presence Servers.

Channel Server

Slack uses Channel Servers (CS) to hold message history of channels and each of such CS are mapped to a subset of channels using consistent hashing.

During peak times, they handle about 16 million channels per host, ensuring that messages reach their intended destinations without delay. Every CS host is responsible for receiving and sending messages for the channels they are mapped to, making the entire system highly efficient.

Gateway Server

Gateway Servers are stateful and in-memory, and serve as the interface between Slack clients and CSs. They are deployed across multiple geographical regions to ensure fast connections, and a draining mechanism handles region failures, which seamlessly switches users to a healthy region.

Admin Server

Admin Server (AS) are stateless and in-memory, facilitating communication between the Webapp backend and CSs.

Presence Servers

Presence Server (PS) are in-memory, track online user status, powering those familiar green presence dots in Slack clients. Users make queries to PS through websockets using GS as a proxy, ensuring that presence notifications are delivered only for the users currently visible on the app screen.

Slack Client

Slack Client establishes a persistent websocket connection to Slack's servers to receive real-time events and maintain the client's state.

  1. When a Slack client boots up, it fetches the user token and the WebSocket connection setup details from the Webapp backend.
  2. With the above information the Slack client initiates a WebSocket connection to the nearest edge region. This ensures low-latency communication.
  3. The request from the Slack client is forwarded to the Gateway Server (GS) through an edge and service proxy.
  4. GS, upon receiving the request, retrieves the user's information from the Webapp backend, which includes details about all the channels the user is a part of.
  5. After obtaining the user's information, GS sends the first message to the Slack client. This establishes the initial connection and prepares the client for real-time communication.
  6. Finally GS subscribes to all the channel servers (CS) asynchronously. After this the Client is ready to send and receive real-time messages.

Sending a message to clients in real-time

Once the client is ready, each message sent in a channel is broadcasted to all clients online in the channel. Let us see how the flow of message happens.

  1. The client hits the Webapp API to send a message.
  2. Webapp sends that message to Admin Server.
  3. Based on channel ID in this message, Admin Server discovers Channel Server (CS), and routes the message to the appropriate CS that hosts the real time messaging for this channel.
  4. When CS receives the message for that channel, it sends out the message to every GS across the world that is subscribed to that channel.
  5. Each GS that receives that message sends it to every connected client subscribed to that channel id.
  6. This is how Slack is able to deliver messages across the world in under 500ms.

Events

Events are special messages, real-time updates that affect the client's state. These events undergo a similar journey as chat messages, keeping the Slack experience dynamic and interactive. Some example of events are, when a user sends a reaction to a message, a bookmark is added, or a member joins a channel.

There are Transient events as well. These events are not persisted in the database. Example of such a events is user typing in a channel.

Conclusion

Slack's real-time messaging system efficiently handles tens of millions of channels per host and serves a similar number of connected clients. This system ensures messages are delivered globally within a mere 500 milliseconds. Moreover, the current infrastructure is designed for linear scalability, meaning it can easily accommodate even more customers in the future.


#scalability #systemdesign #softwareengineering




Meet Mehta

Software Engineer ?? | Crafting future-focused Solutions @MorganStanley | Expert in FinTech, Java, Python, Go, and Typescript | Mitacs Research Scholar | System Design Maven | Tech Blogger & Engaging Speaker

1 年

Great article Ankit Shaw!! Thank you for sharing!!

要查看或添加评论,请登录

Ankit Shaw的更多文章

  • Discord's journey from MongoDB to Cassandra

    Discord's journey from MongoDB to Cassandra

    Discord during its early days used MongoDB to store the messages. Initially, messages were stored with a compound index…

    1 条评论

社区洞察

其他会员也浏览了