WebRTC: Active Speaker Detection
Nilesh Gawande
Co-founder and VP - Innovations at SpringCT. Creator of ProCONF, Creator of ARIA. Expertise in architecting in Video Conferencing systems (WebRTC), Digital Human, CoBrowsing, WebXR, Healthcare and IoT systems.
Grid view and active speaker view are two common layouts used in video conferencing applications.
Example of Grid View:
Example of Active Speaker View:
Introducing active speaker detection in a WebRTC application involves analyzing audio streams from different participants to determine who is currently speaking. You can achieve this by using the Web Audio API to process the audio and make decisions based on the volume levels.
In this example, I'll demonstrate how to implement a basic active speaker detection using JavaScript in a WebRTC application.
Please note that this example assumes you have a basic understanding of WebRTC and have already set up the necessary components for audio stream handling. Also, keep in mind that this is a simplified example, and real-world active speaker detection may involve more sophisticated algorithms and optimizations.
Let's get started:
<!DOCTYPE html
<html>
? ? <head>
? ? ? ? <title>Active Speaker Detection with WebRTC</title>
? ? ? ? <style>
? ? ? ? ? ? .container {
? ? ? ? ? ? ? ? display: flex;
? ? ? ? ? ? }
? ? ? ? ? ? .square {
? ? ? ? ? ? ? ? width: 200px;
? ? ? ? ? ? ? ? height: 200px;
? ? ? ? ? ? ? ? background-color: lightblue;
? ? ? ? ? ? ? ? display: flex;
? ? ? ? ? ? ? ? align-items: center;
? ? ? ? ? ? ? ? justify-content: center;
? ? ? ? ? ? ? ? font-size: 18px;
? ? ? ? ? ? ? ? margin: 20px;
? ? ? ? ? ? ? ? border: 5px solid blueviolet;
? ? ? ? ? ? }
? ? ? ? ? ? .red-border {
? ? ? ? ? ? ? ? border-color: red;
? ? ? ? ? ? }
? ? ? ? </style>
? ? </head>
? ? <body>
? ? ? ? <h1>Active Speaker Detection Example</h1>
? ? ? ? <p>Active Speaker will be highlighted in red border</p>
? ? ? ? <div class="container">
? ? ? ? ? ? <div class="square" id="User1">
? ? ? ? ? ? ? ? <span id="innerText1">User 1</span>
? ? ? ? ? ? </div>
? ? ? ? ? ? <div class="square" id="User2">
? ? ? ? ? ? ? ? <span id="innerText2">User 2</span>
? ? ? ? ? ? </div>
? ? ? ? </div>
? ? ? ? <script src="./app.js"></script>
? ? </body>
</html>
// Global variables to keep track of audio streams and their volume level
const VOLUME_THRESHOLD = 40; // Adjust this threshold to suit your needs
const LOW_GAIN_VALUE = 0.5;
const AUDIO_WINDOW_SIZE = 256;
let audioStreams = new Map();
// Function to update the active speaker indicator on the page
function updateActiveSpeakerIndicator(speakerId, isActiveSpeaker) {
? ? console.log(`${speakerId} is ${isActiveSpeaker ? `` : `not`} an active speaker`);
? ? const squareDiv = document.getElementById(speakerId);
? ? squareDiv.style.border = squareDiv.classList.toggle('red-border', isActiveSpeaker);
}
// Function to handle incoming audio streams from WebRTC peers
function handleAudioStream(stream, userId) {
? ? const audioContext = new AudioContext();
? ? const mediaStreamSource = audioContext.createMediaStreamSource(stream);
? ? // Create an analyser node to process audio data
? ? const analyserNode = audioContext.createAnalyser();
? ? // Window size in samples that is used when performing a Fast Fourier Transform (FFT),
? ? // to get frequency domain data
? ? analyserNode.fftSize = AUDIO_WINDOW_SIZE;
? ? mediaStreamSource.connect(analyserNode);
? ? // Buffer to hold the audio data
? ? const bufferLength = analyserNode.frequencyBinCount;
? ? const dataArray = new Uint8Array(bufferLength);
? ? // Function to process audio data and detect the active speaker
? ? function processAudio() {
? ? ? ? analyserNode.getByteFrequencyData(dataArray);
?
? ? ? ? // Implement your active speaker detection algorithm here
? ? ? ? // For example, you can calculate the average volume of the audio data and use a threshold
?
? ? ? ? // Example: Calculate the average volume
? ? ? ? const averageVolume = dataArray.reduce((acc, val) => acc + val, 0) / bufferLength;
? ? ? ? updateActiveSpeakerIndicator(userId, averageVolume > VOLUME_THRESHOLD);
?
? ? ? ? // Repeat the process for the next audio frame
? ? ? ? requestAnimationFrame(processAudio);
? ? }
?
? ? // Start the audio processing loop
? ? processAudio(); ? ?
? ? // Add the audio stream and its analyser node to the global map
? ? audioStreams.set(userId, { stream, analyserNode });
}
// Function to remove audio stream and stop active speaker detection
function removeAudioStream(userId) {
? ? const streamData = audioStreams.get(userId);
? ? if (streamData) {
? ? ? ? streamData.stream.getTracks().forEach((track) => track.stop());
? ? ? ? streamData.analyserNode.disconnect();
? ? ? ? audioStreams.delete(userId);
? ? }
}
// Function to create stream with 50% gain of the original stream
function createAudioStreamWithLowGain(stream) {
? ? const ctx = new AudioContext();
? ? const gainNode = ctx.createGain();
? ? const audioDest = ctx.createMediaStreamDestination();
? ? const source = ctx.createMediaStreamSource(stream);
? ? gainNode.connect(audioDest);
? ? gainNode.gain.value = LOW_GAIN_VALUE;
? ? source.connect(gainNode);
? ? const lowGainStream = audioDest.stream;
? ? return lowGainStream;
}
window.onload = async () => {
? ? try {
? ? ? ? const user1 = await navigator.mediaDevices.getUserMedia({ audio: true, video: false});
? ? ? ? // For Demonstration purpose, we're creating a user2 stream from user1 stream with 50% gain value.
? ? ? ? // In conference scenario, we should use the WebRTC local and remote audio streams
? ? ? ? const user2 = createAudioStreamWithLowGain(user1);
? ? ? ? handleAudioStream(user1, 'User1');
? ? ? ? handleAudioStream(user2, 'User2');
? ? } catch (error) {
? ? ? ? console.error('Error in GUM:', error);
? ? }
}
window.onbeforeunload = () => {
? ? removeAudioStream('User1');
? ? removeAudioStream('User2');
}
In this example, we set up a basic audio processing function (processAudioData) that calculates the Average value of the audio data in each incoming audio buffer. If the Average value is greater than a specified threshold (VOLUME_THRESHOLD), we assume that the user is actively speaking. The updateActiveSpeakerIndicator function simply updates the indicator on the UI.
Please keep in mind that this is a basic example, and active speaker detection can be more complex depending on your specific requirements and the size of the conference. More sophisticated algorithms, signal processing techniques, and optimizations may be needed for larger-scale applications.
In this example, we have simulated two audio streams and purposely reduced gain of the second stream by 50%. In real applications, this will be a remote stream coming for other peers in the conference.
const user2 = createAudioStreamWithLowGain(user1);
At SpringCT, we develop high quality video conferencing solutions. Our in-depth knowledge of WebRTC and media server has helped us build great conferencing products for our customers.
We have a team of experienced and skilled developers who specialize in WebRTC, utilizing the latest tools, frameworks, and best practices to create robust and reliable applications. Our developers are well-versed in the intricacies of WebRTC protocols, enabling them to optimize performance, minimize latency, and ensure seamless connectivity across different devices and browsers. For more details visit us at https://www.springct.com/collaboration/
Author: Nilesh Gawande , CoAuthor: Ayan Karmakar
React Native || Next.js || Freelancer
4 个月How to Detecting Active Speakers in React Native WebRTC