登录查看更多内容

WebRTC: Active Speaker Detection

Nilesh Gawande

Co-founder and VP - Innovations at SpringCT. Creator of ProCONF, Creator of ARIA. Expertise in architecting in Video Conferencing systems (WebRTC), Digital Human, CoBrowsing, WebXR, Healthcare and IoT systems.

发布日期: 2023年7月28日

+ 关注

Grid view and active speaker view are two common layouts used in video conferencing applications.

Grid View: In the grid view, all participants' video feeds are displayed in a grid-like format on the screen. Each participant's video is given an equal-sized tile, making it easier to see multiple participants simultaneously. Grid view is beneficial for smaller meetings or when it's important to have a visual overview of everyone attending the conference at once.

Example of Grid View:

Active Speaker View: In the active speaker view, the focus is on the participant who is currently speaking. The video feed of the active speaker is prominently displayed on the screen, usually in a larger tile, while other participants' videos may appear in smaller tiles or as thumbnails. The active speaker view dynamically switches to highlight the person speaking at any given time, allowing participants to focus on the current speaker and follow the conversation more closely.

Example of Active Speaker View:

Introducing active speaker detection in a WebRTC application involves analyzing audio streams from different participants to determine who is currently speaking. You can achieve this by using the Web Audio API to process the audio and make decisions based on the volume levels.

In this example, I'll demonstrate how to implement a basic active speaker detection using JavaScript in a WebRTC application.

Please note that this example assumes you have a basic understanding of WebRTC and have already set up the necessary components for audio stream handling. Also, keep in mind that this is a simplified example, and real-world active speaker detection may involve more sophisticated algorithms and optimizations.

Let's get started:

Set up your HTML structure:

<!DOCTYPE html
<html>
? ? <head>
? ? ? ? <title>Active Speaker Detection with WebRTC</title>
? ? ? ? <style>
? ? ? ? ? ? .container {
? ? ? ? ? ? ? ? display: flex;
? ? ? ? ? ? }
? ? ? ? ? ? .square {
? ? ? ? ? ? ? ? width: 200px;
? ? ? ? ? ? ? ? height: 200px;
? ? ? ? ? ? ? ? background-color: lightblue;
? ? ? ? ? ? ? ? display: flex;
? ? ? ? ? ? ? ? align-items: center;
? ? ? ? ? ? ? ? justify-content: center;
? ? ? ? ? ? ? ? font-size: 18px;
? ? ? ? ? ? ? ? margin: 20px;
? ? ? ? ? ? ? ? border: 5px solid blueviolet;
? ? ? ? ? ? }
? ? ? ? ? ? .red-border {
? ? ? ? ? ? ? ? border-color: red;
? ? ? ? ? ? }
? ? ? ? </style>
? ? </head>
? ? <body>
? ? ? ? <h1>Active Speaker Detection Example</h1>
? ? ? ? <p>Active Speaker will be highlighted in red border</p>
? ? ? ? <div class="container">
? ? ? ? ? ? <div class="square" id="User1">
? ? ? ? ? ? ? ? <span id="innerText1">User 1</span>
? ? ? ? ? ? </div>
? ? ? ? ? ? <div class="square" id="User2">
? ? ? ? ? ? ? ? <span id="innerText2">User 2</span>
? ? ? ? ? ? </div>
? ? ? ? </div>
? ? ? ? <script src="./app.js"></script>
? ? </body>
</html>

In the app.js file, implement the active speaker detection logic:

// Global variables to keep track of audio streams and their volume level
const VOLUME_THRESHOLD = 40; // Adjust this threshold to suit your needs
const LOW_GAIN_VALUE = 0.5;
const AUDIO_WINDOW_SIZE = 256;
let audioStreams = new Map();


// Function to update the active speaker indicator on the page
function updateActiveSpeakerIndicator(speakerId, isActiveSpeaker) {
? ? console.log(`${speakerId} is ${isActiveSpeaker ? `` : `not`} an active speaker`);
? ? const squareDiv = document.getElementById(speakerId);
? ? squareDiv.style.border = squareDiv.classList.toggle('red-border', isActiveSpeaker);
}


// Function to handle incoming audio streams from WebRTC peers
function handleAudioStream(stream, userId) {
? ? const audioContext = new AudioContext();
? ? const mediaStreamSource = audioContext.createMediaStreamSource(stream);


? ? // Create an analyser node to process audio data
? ? const analyserNode = audioContext.createAnalyser();
? ? // Window size in samples that is used when performing a Fast Fourier Transform (FFT),
? ? // to get frequency domain data
? ? analyserNode.fftSize = AUDIO_WINDOW_SIZE;
? ? mediaStreamSource.connect(analyserNode);


? ? // Buffer to hold the audio data
? ? const bufferLength = analyserNode.frequencyBinCount;
? ? const dataArray = new Uint8Array(bufferLength);


? ? // Function to process audio data and detect the active speaker
? ? function processAudio() {
? ? ? ? analyserNode.getByteFrequencyData(dataArray);
? 
? ? ? ? // Implement your active speaker detection algorithm here
? ? ? ? // For example, you can calculate the average volume of the audio data and use a threshold
? 
? ? ? ? // Example: Calculate the average volume
? ? ? ? const averageVolume = dataArray.reduce((acc, val) => acc + val, 0) / bufferLength;
? ? ? ? updateActiveSpeakerIndicator(userId, averageVolume > VOLUME_THRESHOLD);
? 
? ? ? ? // Repeat the process for the next audio frame
? ? ? ? requestAnimationFrame(processAudio);
? ? }
? 
? ? // Start the audio processing loop
? ? processAudio(); ? ?


? ? // Add the audio stream and its analyser node to the global map
? ? audioStreams.set(userId, { stream, analyserNode });
}

// Function to remove audio stream and stop active speaker detection
function removeAudioStream(userId) {
? ? const streamData = audioStreams.get(userId);
? ? if (streamData) {
? ? ? ? streamData.stream.getTracks().forEach((track) => track.stop());
? ? ? ? streamData.analyserNode.disconnect();
? ? ? ? audioStreams.delete(userId);
? ? }
}


// Function to create stream with 50% gain of the original stream
function createAudioStreamWithLowGain(stream) {
? ? const ctx = new AudioContext();
? ? const gainNode = ctx.createGain();
? ? const audioDest = ctx.createMediaStreamDestination();
? ? const source = ctx.createMediaStreamSource(stream);


? ? gainNode.connect(audioDest);
? ? gainNode.gain.value = LOW_GAIN_VALUE;
? ? source.connect(gainNode);
? ? const lowGainStream = audioDest.stream;
? ? return lowGainStream;
}



window.onload = async () => {
? ? try {
? ? ? ? const user1 = await navigator.mediaDevices.getUserMedia({ audio: true, video: false});
? ? ? ? // For Demonstration purpose, we're creating a user2 stream from user1 stream with 50% gain value.
? ? ? ? // In conference scenario, we should use the WebRTC local and remote audio streams 
? ? ? ? const user2 = createAudioStreamWithLowGain(user1);


? ? ? ? handleAudioStream(user1, 'User1');
? ? ? ? handleAudioStream(user2, 'User2');
? ? } catch (error) {
? ? ? ? console.error('Error in GUM:', error);
? ? }
}


window.onbeforeunload = () => {
? ? removeAudioStream('User1');
? ? removeAudioStream('User2');
}

In this example, we set up a basic audio processing function (processAudioData) that calculates the Average value of the audio data in each incoming audio buffer. If the Average value is greater than a specified threshold (VOLUME_THRESHOLD), we assume that the user is actively speaking. The updateActiveSpeakerIndicator function simply updates the indicator on the UI.

Please keep in mind that this is a basic example, and active speaker detection can be more complex depending on your specific requirements and the size of the conference. More sophisticated algorithms, signal processing techniques, and optimizations may be needed for larger-scale applications.

In this example, we have simulated two audio streams and purposely reduced gain of the second stream by 50%. In real applications, this will be a remote stream coming for other peers in the conference.

const user2 = createAudioStreamWithLowGain(user1);

At SpringCT, we develop high quality video conferencing solutions. Our in-depth knowledge of WebRTC and media server has helped us build great conferencing products for our customers.

We have a team of experienced and skilled developers who specialize in WebRTC, utilizing the latest tools, frameworks, and best practices to create robust and reliable applications. Our developers are well-versed in the intricacies of WebRTC protocols, enabling them to optimize performance, minimize latency, and ensure seamless connectivity across different devices and browsers. For more details visit us at https://www.springct.com/collaboration/

Author: Nilesh Gawande , CoAuthor: Ayan Karmakar

Paresh Chavda

React Native || Next.js || Freelancer

4 个月

How to Detecting Active Speakers in React Native WebRTC

要查看或添加评论，请登录

Nilesh Gawande的更多文章

How AI is Transforming Customer Communication in the UCaaS Industry

2024年11月27日

How AI is Transforming Customer Communication in the UCaaS Industry

The Unified Communications as a Service (UCaaS) industry has been at the forefront of reshaping how businesses connect…
Building Digital Human - Part I

2023年9月2日

Building Digital Human - Part I

Problem Statement: While current chatbot services undeniably offer quick responses to customers, they often leave users…

2 条评论
WebRTC: Dynamic Webcam using AI

2023年8月17日

WebRTC: Dynamic Webcam using AI

"Dynamic webcam" refers to a webcam with advanced features that allow it to adapt its settings in real-time based on…
Reducing Background Audio Noise with getUserMedia in WebRTC

2023年7月24日

Reducing Background Audio Noise with getUserMedia in WebRTC

WebRTC's getUserMedia API allows developers to access users' media devices, such as microphones and cameras, directly…

1 条评论
WebRTC - Why people are moving away from MCU to SFU Media servers ?

2023年7月15日

WebRTC - Why people are moving away from MCU to SFU Media servers ?

WebRTC (Web Real-Time Communication) is a technology that enables real-time communication, including audio, video, and…
Optimizing Media Streaming: Unleashing the Power of MediaStreamTrack's contentHint Property

2023年7月11日

Optimizing Media Streaming: Unleashing the Power of MediaStreamTrack's contentHint Property

In today's digital age, media streaming has become an integral part of our online experiences. Whether it's video…
The first step in WebRTC development. Capturing video stream using getUserMedia() API

2023年7月5日

The first step in WebRTC development. Capturing video stream using getUserMedia() API

WebRTC, an ever-evolving technology, continues to enhance its most renowned function – getUserMedia(). This powerful…
Choosing a right partner for your WebRTC development?

2023年7月2日

Choosing a right partner for your WebRTC development?

WebRTC is a powerful technology that allows for real-time communication (RTC) between web browsers and mobile devices…
Brain-Machine Interfaces (BMI)

2023年6月18日

Brain-Machine Interfaces (BMI)

Neuralink is a neurotechnology company founded by Elon Musk in 2016. The company aims to develop implantable…
Face Analysis

2023年6月12日

Face Analysis

Introduction AWS Rekognition is a cloud-based service that provides highly accurate and scalable face analysis…

See all articles

Nilesh Gawande的更多文章

How AI is Transforming Customer Communication in the UCaaS Industry

Building Digital Human - Part I

WebRTC: Dynamic Webcam using AI

Reducing Background Audio Noise with getUserMedia in WebRTC

WebRTC - Why people are moving away from MCU to SFU Media servers ?

Optimizing Media Streaming: Unleashing the Power of MediaStreamTrack's contentHint Property

The first step in WebRTC development. Capturing video stream using getUserMedia() API

Choosing a right partner for your WebRTC development?

Brain-Machine Interfaces (BMI)

Face Analysis