How to build a mini Zoom app?
Shrey Batra
CEO @ Cosmocloud | Ex-LinkedIn | Angel Investor | MongoDB Champion | Book Author | Patent Holder (Distributed Algorithms)
In times of pandemic, one thing that really took off was the need of online conversations and video calls. Zoom, Microsoft Teams and Google Meet were the go-to apps in the market for every industry and office. But how does these apps work?
Let's try to understand how to build a very simple, mini video conferencing tool similar to Zoom or Google Meet and see the system design of these apps.
Design Overview
Before we go deep into what, why and how, let's understand the basics of a video conferencing application.
For 2 or more people to connect and converse over a video call, we need a single client per person and a main server / broker which would connect the clients / route the traffic between these clients. Something like below -
Pub/Sub Based System
Now you might or might not know the Pub/Sub model already, but this is a very common and simple model architecture to use when you want to create a connection between 2 points and send some data regularly over the connection. Think of this as a UDP network.
Let's start small. Imagine, we have only 1 person broadcasting their video to everyone, and all other users are mere listeners. In this scenario, we have 1 user as a Publisher whereas all other users are Subscribers to the publisher's broadcasting channel. The publisher creates a channel and broadcasts its data (the user's video in this case) over this particular channel / network connection. This is indeed how systems like LIVE broadcast online works.
Now let's extend this model a bit, so that every user creates their own channel and broadcasts to their own channel and every user subscribes to every other user's channel. Overall, we have created a graph of N nodes (or users) broadcasting their data, and each node is connected with every other node (or user) when they subscribe them. A simple graph below can explain this statement -
So, overall we have created 5 channels or network connections, where each user / node owns one, and subscribes to rest 4 (according to above diagram). Keep in mind, there is a Broker / server between all the User's who orchestrates the data flow and keeps connections with each node.
Understanding live streaming a bit more
Now, after knowing the basics of Pub/Sub system, let's see how is a Pub/Sub different from peer to peer network.
In a Pub/Sub network, you need a message broker, or a server who orchestrates the flow of data within all clients in a meeting call. This server is responsible to know which clients are active, which are not, who to send the message to and from where is the message coming from.
This is done using Channels or Topics, where the server creates a unique topic for a node to broadcast their data to other nodes, and subscribes other nodes in that particular call to the channel created for the first user.
领英推荐
Difference between Pub/Sub and Queue Systems
The most major differences between a Pub/Sub and Queue systems is that -
Building a Mini video call app in Python
Let's talk about and try to use Redis Pub/Sub, a very famous, very light weight Pub/Sub system. We'll try to write a simple pseudocode for building the app using Python, Redis (for Pub/Sub) and OpenCV (for image and video handling)
Let's start by building out the publishing code for any user...
# Using OpenCV for video handl
import cv
# Using Redis for Pub/Sub
import redis
?
# Open a Video capture for your webcam.
video = cv2.VideoCapture(0)
# Resizing parameters
height = int(video.get(4) / 2)
width = int(video.get(3) / 2)
# Open a Redis connection to be used as publisher
server = redis.Redis()
# While video camera is on...
while video.isOpened():
# Read a frame from the camera. Frame is a 2D array of pixels.
????ret, frame = video.read()
????if ret:
????????frame = cv2.resize(frame, (width, height))
?# Publish the frame (message packet) to a channel named user_1
??????? server.publish("user_1", frame.tobytes())
?
video.release()
The above code opens a video stream, and reads a frame every instant. It then converts the frame to bytes and publishes it as a message to the channel user_1.
Let's write the code for subscribing to the channel -
# Open CV to display incoming frame
import cv2
# Redis to subscribe
import redis
# Numpy to convert message packet to 2D array of pixels.
import numpy as np
client = redis.Redis()
# Creating a Pub Sub client to subscribe to some channels.
client_channel = client.pubsub()
# Let's subscribe to other user's channel.
client_channel.subscribe("user_1")
# Listening to messages in the channel
for item in client_channel.listen():
if item["type"] != "message":
continue
# For every message received in the channel, converting the bytes to a 2D array.
frame = np.frombuffer(item["data"], dtype="uint8").reshape(360, 640, 3)
# Displaying this frame back to the User client.
cv2.imshow("Frame", frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
client_channel.unsubscribe()
cv2.destroyAllWindows()s
Over here, we create a PubSub client and subscribe to other users channels (user_1 here). For every message we receive, we convert the bytes to a 2D frame and display it on the User's screen.
Conclusion
This was it.! You can write more code to have N number of users, send audio over the network, do more of image processing and virtual backgrounds and what not..! The crux of the online video streaming applications remain the same.
Later, I'll also write over an article on how to extend this simple application to Store and Replay videos, much like Netflix..! For that, subscribe to this newsletter for all the updates..!
Engineering at Atomicwork | X Stanza Living, Whatfix| Java | Spring Boot | Redis | Low Level Design
3 年Difference between pub/sub and queue is nicely explained.
Product Analytics at Outbrain | Ex-Times Internet
3 年Nicely stitched article. I'm definitely gonna try this.