Design Your Own Messaging Platform
Bhavin Gandecha
Program Management | OTT | Hustle Mindset | Student of Leadership | Development Speaker | Love Coaching & Mentoring | Hope Influencer
This article explores the system design of chat applications like WhatsApp, FB messenger, Telegram, etc. I have tried to fit in a lot of features apart from just sending and receiving the messages.
We will start with listing all the features for the end-user, later we will note down different system components that will be required, and then we will get into designing various aspects of the product.
Hope you enjoy this article, do let me know if you have any comments or suggestions.
Features for the end-user:
- Send 1-1 message
- Sent & Read Receipts Notification (Ref: Single tick & blue tick on Whatsapp)
- Get notified when a message is received
- Last Seen/online
- Share Images & files
- Notification while the app is not in use
- Initiate group chat
Let's look at what components would be required:
- Amazon API Gateway: These act as a connector to users who are trying to access the backend. API gateways will handle multiple API calls, auto auto-scale, and also act as load balancers our system will deploy multiple gateways to spread the load when users are trying to access it.
We will be making use of the WebSocket API. This supports two-way communication between the front and the backed enable. Backend can send call-back messages to the client.
2. Amazon SQS: This is used to decouple & scale microservices. Amazon SQS increases applications resilience by decoupling the direct communication between the frontend application and the worker tier that does data processing.
Without SQS service, the architecture becomes synchronous i.e. all the components scale at the same time and you may not require that. It is always advisable for the systems to scale independently whenever required.
We will also use Dead Letter Queues (DLQ). It handles messages that Lambda has not successfully processed. When a message's maximum receive count is exceeded, Amazon SQS moves the message to the DLQ associated with the original queue.
3. Multi-AZ Amazon EC2 instances within ASG: Multi-AZ will allow us to build in resiliency, EC2 will help with computing
Auto Scaling Group can scale based on the number of messages in the queue indicated by the CloudWatch alarm
4. AWS Lamda: Lambda will poll the queue and invokes your Lambda function?synchronously?In our case the function will be to send the images and files to the S3 bucket and send the message for processing to EC2 instances. When the function successfully processes a batch in SQS, Lambda will delete the messages from the queue.
5. Amazon DynamoDB: No SQL database to store messages in a key-value pair and for faster retrieval. DynamoDB will contain information about each of the users and gateways that they are connected to (mapping). The group chat will also contain the group chat id and associated users.
6. Amazon S3: Store images, docs, links, etc that are shared during the chat.
The reason we have chosen to store files in S3 buckets is due to the content retrieval costs. When it comes to S3, there are multiple classes where we can save on costs further. Check out my article on Strategies to Store On Storage Cost
7. CloudWatch Alarms: Primary use of the CloudWatch alarm in our design will be to monitor the SQS queue. Once it reaches the said threshold, it will indicate the Auto Scaling group to scale up or down.
领英推è
Let's design the system for 1-1 messages between user A & user B:
- User A is connected to API gateway which receives the message
- The message is sent to the SQS queue, Lamda function reads the message in the queue.
- If the message contains any file or image, Lambda sends it to the S3 bucket
- Lamda function also sends the message to the back-end system in our case it is an EC2 instance behind the Auto Scaling group.
- EC2 instance processes the message and stores it into DynamoDB. DynamoDB has the mapping information about which user is connected to which gateway.
- Based on the CloudWatch metric (ApproximateNumberOfMessagesVisible) an alarm is set which scales in & scales out the ASG
- To send the message to User B, WebSocket API is used within the API Gateways. WebSocket helps in communicating from server to client.
System Design for Sent & Read Receipts:
- When user A sends a message, it reaches the API gateway and through the API gateway, it reaches the SQS, as soon as the message hits the SQS a confirmation is sent to the client. This is the sent notification
- When user B receives the message, the client sends out a notification that follows a parallel SQS queue and this notification follows the same path as 1-1 message. Just the difference is, there is no message but just a notification
- Through the EC2 notification reaches DynamoDB, where it identifies the mapping and though the WebSocket API it is sent back to the user A
System Design for Last Seen/Online:
Here we will extensively use features within WebSocket API (within API Gateway)
In your WebSocket API, incoming JSON messages are directed to backend integrations based on routes that you configure. (Non-JSON messages are directed to a?$default?route that you configure.
Your routing table would specify which action to perform by matching the value of the?action?property against the custom route key values that you have defined in the table.
There are three predefined routes that can be used:?$connect,?$disconnect, and?$default. In addition, you can create custom routes.
Client apps connect to your WebSocket API by sending a WebSocket upgrade request. If the request succeeds, the?$connect?route is executed while the connection is being established
- Once the connection is established, the system can note the time of the connection and can update it after every few minutes.
- As soon as the connection is disconnected, the last know connection time is then displayed to the other user. This process can also follow a separate SQS queue system that we discussed earlier.
System Design for Notification:
Here we can use long polling on the client.
- Users should have accepted the notification feature on their devices.
- Even when the app is closed, the connection to the API Gateway is still live via $connect.
- During a specific interval, the client checks with the API on the messages available to receive.
System Design for Group Chat
Consider we have user A, user B, user C and user D. User A creates a group and adds all the other user
- Request to create a group hits the API Gateway & the request is passed on to the relevant SQS queue
- From the SQS queue, the request is passed on to EC2 instances behind the AutoScaling group and an entry gets created into the DynamoDB,
- Once DynamoDB has the record users can start messaging in the group as described above
Hope you enjoyed the article, if in case you have any queries or suggestions please feel free to reach out to me.