System design: Chat messenger like WhatsApp

What is Chat messenger?

Now a days, we are all using one or other kind of personal chat messenger like WhatsApp or Signal etc. We are using this application to send message to individual or to group. We can send text message or media messages (image, video, document etc.).

Functional requirements

We will discuss and design below features of chat messenger.

  1. Send text message (One to one)
  2. Ack of Sent, Delivered and Read receipts
  3. Last seen of an individual
  4. Send media message
  5. Profile management

We also need to make sure that all these services should be reliable and should be able to handle huge amount of traffic. There could be huge traffic on some occasions. E.g. New year

High level design

No alt text provided for this image

Network protocol

Whenever any message sent by A to B, there has to be an intermediator between A and B because A doesn’t know the address of B. Off course, as intermediator, server would come in to picture. When A wants to send message to B, first A needs to send it to server and then server will forward that message to B. But is it possible for server to initiate the request? HTTP works based on request-response. it means that whenever server receives any request then only it can send back response to client. so in our scenario where client is different other than who has sent the message (A), server cannot directly send message to B.

To solve this problem, we can use websocket. It provides full-duplex connection over single TCP connection. Whenever any user connects with the internet, it creates the TCP connection with the server. This is a private tunnel where client and server both can send message to each other in secure way that is why its called as full-duplex connection.

Send text message

When user A send message to user B, if A is not connected with the internet then mobile client saves that message in to local sql db like sqlite. When user comes online, client sends pending message to gateway like G1 and its establish duplex connection with client. To send message, G1 sends that request to MessageService. MessageService query in to the database whether B is currently connected with any gateway or not. If it doesn’t find B in database then it will keep that message in server only and when B comes online, service will forward that message to B via gateway (G2) and delete that message from the server if it has stored.

No alt text provided for this image

To store these connections, we can use Redis (distributed cache). Its key-value pair database and keeps all data in primary memory. By using redis we can quickly retrieve the details about user like which user connected with which gateway. We can also flush these data to secondary memory whenever required.

Ack of Sent, Delivered and Read receipts

MessageService has delivered the message to B however B has not opened that message. So B’s client will send an Ack back to the MessageService that message has been delivered. MessageService will send “Ack of Sent” to user A. In the same way, when user B reads the message, client will again send the Ack to MessageService and service will send that Ack of read receipts to A.

Last seen

We can track last seen of an user by couple of ways. When user perform any activity in the client like send text or media message, MessageService invoke LastSeenService to update the timestamp of that user. Sometimes, user connected with gateway but client is closed and continuously receiving the message in background like using notifications. Message delivered Ack send back to the service. in such scenarios, user has not opened the application yet so this would be system initiated messages and not the user initiated message so LastSeen shouldn’t be updated. In this way, we can keep updating the user’s lastseen in database.

No alt text provided for this image

but here there is a catch. User can keep application open and do not perform any activity so in this case, LastSeen of user should be updated. So client can send LastSeen timestamp at regular interval say every 5 or 10 secs and LastSeenService will update the timestap in database for that user. Disadvantage of this way is that it regularly send the update to the service and uses the network bandwidth.

To store LastSeen of user, we can use Redis so we can quickly retrieve the lastseen of any user when needed.

Send media messages

Here media messages can be of type image or video or document. When user C sends media message, it invokes service called MediaService. This service will save that media file in to some external storage or in CDN. Along with this, it will also generate a unique hash of that message and invokes the MessageService. MessageService will forward that message (contains hash) to user B so B’s client can download that media file based on that hash. Why we need hash here? Answer is to identify message (Ack can be sent back to the sender) and to get the storage location of media file until its downloaded to user’s device.

No alt text provided for this image

Load balancers of each microservices

Single instance of service can not handle the traffic so we need multiple instances of each service. Load balancer can be placed in front of each service so traffic can be distributed between multiple instance of same service.

要查看或添加评论,请登录

Jayesh Tanna的更多文章

  • SDK vs. API

    SDK vs. API

    Recently, I joined the Python SDK team, which has given me a unique perspective on the world of SDKs. Having previously…

    2 条评论
  • Database sharding

    Database sharding

    Data partitioning, or sharding, involves dividing a large database into smaller pieces. This helps improve how the…

  • Latency metrics

    Latency metrics

    Latency measures are crucial for checking how well your apps and services perform. Latency means the total time it…

    1 条评论
  • Kubernetes Resource Quota and LimitRange

    Kubernetes Resource Quota and LimitRange

    Kubernetes allows you to manage your application in numerous ways. Consider that your users spread across multiple…

  • PACELC theorem

    PACELC theorem

    In any distributed system, different kinds of failure can happen like network loss or device failure in a machine etc…

  • Business Continuity and Disaster Recovery (BCDR)

    Business Continuity and Disaster Recovery (BCDR)

    What is Business continuity and disaster recovery? BCDR represents a set of approaches or processes that helps a…

  • Consistency patterns

    Consistency patterns

    In distributed system, to achieve availability, we write data at multiple places. It is possible that server could go…

  • System design : pastebin.com

    System design : pastebin.com

    About pastebin.com User can paste or write or store text for the specific period of time and the same content can be…

  • SQL or NoSQL

    SQL or NoSQL

    There are two mainly two types in the world of databases: SQL and NoSQL (or relational databases and non-relational…

    1 条评论
  • Differences Between Push And Pull CDNs

    Differences Between Push And Pull CDNs

    Content delivery networks (CDNs) are most useful when we want to serve static files to our users like CSS, JS, HTML or…

社区洞察

其他会员也浏览了