登录查看更多内容

WebRTC: Dynamic Webcam using AI

Nilesh Gawande

Co-founder and VP - Innovations at SpringCT. Creator of ProCONF, Creator of ARIA. Expertise in architecting in Video Conferencing systems (WebRTC), Digital Human, CoBrowsing, WebXR, Healthcare and IoT systems.

发布日期: 2023年8月17日

"Dynamic webcam" refers to a webcam with advanced features that allow it to adapt its settings in real-time based on certain conditions. In this blog we will cover one aspect of Dynamic Webcam that will allow us to track user in the camera frame and adjust his video stream such that he will remain in center position irrespective of his position in front of camera.

See below image. If the user is seating little way from camera other users in conference will see him smaller in tile (remote view)

This can be improved by utilizing a Dynamic Webcam. Take a look at the adjusted remote view below. Doesn't it present the user nicely centered within the frame?

Human Face Detection in Live Stream:

The initial step in addressing this challenge involves detecting individuals within a webcam stream. We will employ the Mediapipe AI model from TensorFlow to achieve user detection. To incorporate Mediapipe into your project, please follow the instructions provided in this link

https://github.com/tensorflow/tfjs-models/tree/master/face-detection/src/mediapipe

Include the following scripts in index.html

<!-- Require the peer dependencies of face-detection. --
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/face_detection"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>

<!-- You must explicitly require a TF.js backend if you're not using the TF.js union bundle. -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl"></script>

<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/face-detection"></script>>

and initializefaceDetector as follows:

// Initialize the object detecto
const initializefaceDetector = async () => {
? const model = faceDetection.SupportedModels.MediaPipeFaceDetector;
? const detectorConfig = {
? ? runtime: 'mediapipe',
? ? modelType: 'full',
? ? maxFaces: 6,
? ? solutionPath: 'https://cdn.jsdelivr.net/npm/@mediapipe/face_detection',
? };
? faceDetector = await faceDetection.createDetector(model, detectorConfig);
};

Once the model is loaded, we are ready to process the local video stream for presence of human face. This can be done by the following:

detections = await faceDetector.estimateFaces(video, {flipHorizontal: false});

The video parameter passed here is HTML video element which holds the local video stream obtained from navigator.mediaDevices.getUserMedia(). The detections will hold user face(s) coordinates which can be further used to clip user from stream and place him in the center position in destination stream.

Conference on Janus Media Server:

To prove the concept of Dynamic Webcam we must process local stream and the corrected stream should be transmitted in video conference so that other users in conference see a corrected view. For this reason, we are going to use Janus media server. We need at-least two users to join the conference and share their video stream. Both user would start video conference normally by joining the room. Both users will see each other's video (local view and remote view). Now first user can enable the dynamic webcam by clicking on checkbox. The effect of dynamic webcam is to be observed by the second user.

To see the effect of dynamic webcam, we have written a function that will toggle user's feed in video conference to transmit original video or processed (croppedStream) video. See below toggleDynamicWebcam() function:

function toggleDynamicWebcam() 
? // listen to change of isDynamic button state
? let isDynamic = document.getElementById('isdynamic');
? console.log('IsDynamic state:', isDynamic?.checked);
? 
? if (isDynamic.checked) {
? ? video.srcObject = localVideoStream;
? ? video.play();
? ? video.onloadedmetadata = () => {
? ? ? predictWebcam();
? ? ? const croppedStream = backGroundCanvas.captureStream();
? ? ? document.getElementById('local_video').srcObject = croppedStream;
? ? ? replaceVideoTrack(croppedStream.getVideoTracks()[0]);
? ? }
? } else {
? ? replaceVideoTrack(localVideoStream.getVideoTracks()[0]);
? ? document.getElementById('local_video').srcObject = localVideoStream;
? ? cancelAnimationFrame(animationTimer);
? ? cancelAnimationFrame(secDrawAnimTimer);
? ? video.srcObject = null;
? ? dummyVideos.srcObject = null;
? }
}

领英推荐

Can Computer Vision Detect Heart Rates???

Ritesh Kanjee 11 个月前

Artificial Intelligence Will Impact Your Industry

? Daniel Burrus 8 年前

Taming the Wild Constraints: How Proactive…

Malith Disala,MBA 4 个月前

The predictWebcam function takes care of using faceDetector model. Here identification of user in frame is done and relevant portion of source frame is cropped and using drawCroppedFrame() function it is copied on to a background canvas.

// Prediction loop
async function predictWebcam() {
? // Now let's start classifying the stream.
? let detections = [];
? if(!isUpdating) {
? ? try {
? ? ? detections = await faceDetector.estimateFaces(video, {flipHorizontal: false});
? ? ? ENABLE_LOG && console.log('faces:', detections);
? ? } catch (error) {
? ? ? console.error('error in estimate faces:', error);
? ? }


? ? for (let n = 0; n < detections.length; n++) {
? ? ? if(detections[n].box.xMin <= boundingBoxLeftMost.x) {
? ? ? ? setBox(boundingBoxLeftMost, detections[n].box);
? ? ? }
? ? ? if((detections[n].box.xMin + detections[n].box.width)
? ? ? ? > (boundingBoxRightMost.x + boundingBoxRightMost.width)) {
? ? ? ? setBox(boundingBoxRightMost, detections[n].box);
? ? ? }
? ? ? if(detections[n].box.yMin <= boundingBoxTopMost.y) {
? ? ? ? setBox(boundingBoxTopMost, detections[n].box);
? ? ? }
? ? ? if((detections[n].box.yMin + detections[n].box.height)
? ? ? ? > (boundingBoxBelowMost.y + boundingBoxBelowMost.height)) {
? ? ? ? setBox(boundingBoxBelowMost, detections[n].box);
? ? ? }
? ? }


? ? targetBbox = [
? ? ? boundingBoxLeftMost.x,
? ? ? boundingBoxTopMost.y,
? ? ? boundingBoxRightMost.x - boundingBoxLeftMost.x + boundingBoxRightMost.width,
? ? ? boundingBoxBelowMost.y - boundingBoxTopMost.y + boundingBoxBelowMost.height,
? ? ]
? ? resetBboxes();
? ? ENABLE_LOG && console.log('targetBbox:', targetBbox, detections.length);


? ? if(detections.length > 0 && !isUpdating) {
? ? ? updateCroppingBoxDimension(targetBbox);
? ? }
? }


? drawCroppedFrame();
? animationTimer = window.requestAnimationFrame(predictWebcam);
}

Please note, we have used window.requestAndimationFrame(predictWebcam) to call predictWebcam function recursively in a loop.

Generate new stream using drawCroppedFrame()

function drawCroppedFrame() 
? const context = backGroundCanvas.getContext('2d');
? const x = boundingBox.x - (boundingBox.width / 2);
? const y = boundingBox.y - (boundingBox.height / 1.5);
? let videoWidth = 2 * (boundingBox.x - x) + boundingBox.width;
? let videoHeight = 3 * (boundingBox.y - y) + boundingBox.height;
? videoWidth = x + videoWidth >= video.videoWidth ? video.videoWidth - x : videoWidth;
? videoHeight = y + videoHeight >= video.videoHeight ? video.videoHeight - y : videoHeight;
? const hRatio = backGroundCanvas.width / videoWidth;
? const vRatio = backGroundCanvas.height / videoHeight;
? const ratio = Math.min(hRatio, vRatio);
? const centerShiftX = (backGroundCanvas.width - videoWidth * ratio) / 2;
? const centerShiftY = (backGroundCanvas.height - videoHeight * ratio) / 2;
? context.clearRect(0, 0, backGroundCanvas.width, backGroundCanvas.height);
? context.fillStyle = 'grey';
? context.fillRect(0, 0, backGroundCanvas.width, backGroundCanvas.height);
? context.drawImage(video, parseInt(x, 10), parseInt(y, 10),
? ? parseInt(videoWidth, 10), parseInt(videoHeight, 10),
? ? parseInt(centerShiftX, 10), parseInt(centerShiftY, 10),
? ? parseInt(videoWidth * ratio, 10), parseInt(videoHeight * ratio, 10)
? );
}

Source Code:

Please refer this git link for the entire source code:

Please note: Before you run this code, please ensure you set janusURL to your Janus installation. The current janusURL is that of public janus deployment and you will have to use room name as 1234 which is a default room.

const janusURL = 'wss://janus.conf.meetecho.com/ws'

Also note: this is not a production quality code. janus-service.js is modified to support two users only.

A little History:

I came across a great product offered by Poly at UCX London event. They provide similar feature but using hardware based camera Studio-x70 (https://www.poly.com/in/en/products/video-conferencing/studio/studio-x70). This camera offers various features including dynamic zoom-in/out. This was an inspiration for me to find a similar software based solution for video conferencing systems and hence we created dynamic webcam project.

At SpringCT, we develop high quality video conferencing solutions. Our in-depth knowledge of WebRTC and media server has helped us build great conferencing products for our customers.

We have a team of experienced and skilled developers who specialize in WebRTC, utilizing the latest tools, frameworks, and best practices to create robust and reliable applications. Our developers are well-versed in the intricacies of WebRTC protocols, enabling them to optimize performance, minimize latency, and ensure seamless connectivity across different devices and browsers. For more details visit us at https://www.springct.com/collaboration/

Author: Nilesh Gawande

CoAuthor: Ayan Karmakar

要查看或添加评论，请登录

Nilesh Gawande的更多文章

How AI is Transforming Customer Communication in the UCaaS Industry

2024年11月27日

How AI is Transforming Customer Communication in the UCaaS Industry

The Unified Communications as a Service (UCaaS) industry has been at the forefront of reshaping how businesses connect…
Building Digital Human - Part I

2023年9月2日

Building Digital Human - Part I

Problem Statement: While current chatbot services undeniably offer quick responses to customers, they often leave users…

2 条评论
WebRTC: Active Speaker Detection

2023年7月28日

WebRTC: Active Speaker Detection

Grid view and active speaker view are two common layouts used in video conferencing applications. Grid View: In the…

1 条评论
Reducing Background Audio Noise with getUserMedia in WebRTC

2023年7月24日

Reducing Background Audio Noise with getUserMedia in WebRTC

WebRTC's getUserMedia API allows developers to access users' media devices, such as microphones and cameras, directly…

1 条评论
WebRTC - Why people are moving away from MCU to SFU Media servers ?

2023年7月15日

WebRTC - Why people are moving away from MCU to SFU Media servers ?

WebRTC (Web Real-Time Communication) is a technology that enables real-time communication, including audio, video, and…
Optimizing Media Streaming: Unleashing the Power of MediaStreamTrack's contentHint Property

2023年7月11日

Optimizing Media Streaming: Unleashing the Power of MediaStreamTrack's contentHint Property

In today's digital age, media streaming has become an integral part of our online experiences. Whether it's video…
The first step in WebRTC development. Capturing video stream using getUserMedia() API

2023年7月5日

The first step in WebRTC development. Capturing video stream using getUserMedia() API

WebRTC, an ever-evolving technology, continues to enhance its most renowned function – getUserMedia(). This powerful…
Choosing a right partner for your WebRTC development?

2023年7月2日

Choosing a right partner for your WebRTC development?

WebRTC is a powerful technology that allows for real-time communication (RTC) between web browsers and mobile devices…
Brain-Machine Interfaces (BMI)

2023年6月18日

Brain-Machine Interfaces (BMI)

Neuralink is a neurotechnology company founded by Elon Musk in 2016. The company aims to develop implantable…
Face Analysis

2023年6月12日

Face Analysis

Introduction AWS Rekognition is a cloud-based service that provides highly accurate and scalable face analysis…

See all articles

WebRTC: Dynamic Webcam using AI

Nilesh Gawande

Co-founder and VP - Innovations at SpringCT. Creator of ProCONF, Creator of ARIA. Expertise in architecting in Video Conferencing systems (WebRTC), Digital Human, CoBrowsing, WebXR, Healthcare and IoT systems.

领英推荐

Nilesh Gawande的更多文章

社区洞察

其他会员也浏览了

What, When and How AGI/Super-Intelligence?

Artificial Intelligence and the Elephant in the Room: General application and high-risk environments

Episode #16 - AI Weekly: by Aruna

From GPT-4 doing taxes to Hollywood AI actors, to new season spacesuits and aliens.

3 Secret Discoveries Of AI

Schrodinger's Cat's Kitten Named Ralphie Lives or Does It?

The eternal debate : AI - threat or opportunity ?

The AI Cube: Understanding the Quadratic Scaling Challenge in AI

The dark side of Artificial Intelligence

Artificial Intelligence: Embrace the Tide or Fade Away

领英推荐

Nilesh Gawande的更多文章

How AI is Transforming Customer Communication in the UCaaS Industry

Building Digital Human - Part I

WebRTC: Active Speaker Detection

Reducing Background Audio Noise with getUserMedia in WebRTC

WebRTC - Why people are moving away from MCU to SFU Media servers ?

Optimizing Media Streaming: Unleashing the Power of MediaStreamTrack's contentHint Property

The first step in WebRTC development. Capturing video stream using getUserMedia() API

Choosing a right partner for your WebRTC development?

Brain-Machine Interfaces (BMI)

Face Analysis

社区洞察

其他会员也浏览了

What, When and How AGI/Super-Intelligence?

Artificial Intelligence and the Elephant in the Room: General application and high-risk environments

Episode #16 - AI Weekly: by Aruna

From GPT-4 doing taxes to Hollywood AI actors, to new season spacesuits and aliens.

3 Secret Discoveries Of AI

Schrodinger's Cat's Kitten Named Ralphie Lives or Does It?

The eternal debate : AI - threat or opportunity ?

The AI Cube: Understanding the Quadratic Scaling Challenge in AI

The dark side of Artificial Intelligence

Artificial Intelligence: Embrace the Tide or Fade Away