登录查看更多内容

VideoChat in under 30 Minutes: A Rails/React Tutorial

Nicolas Schneider

Senior Software Engineer

发布日期: 2019年4月30日

If you’re reading this, you’re probably in a similar situation that I was in not too long ago. You’re a software developer that for some reason or another is trying to incorporate video or voice calling into your Rails app.

Look no further! Today, you are going to learn to implement just that using React, Rails, and WebRTC!

Github Link: https://github.com/NicolasESchneider/VideoCall-React-Component---Tutorial

Part 0: Setting Up Webpack and Rails

To start off we need to make a new Rails application:

rails new video_call_app
rails cd video_call_app
rails g controller Calls
rails g channel Call

Great! Now let’s take care of our routes and our Root HTML page. Make a file at app/views/calls/root.html.erb and inside of it put this:

<main id="root">
   <h1>Future site of a wonderful video chat react component</h1>
</main>

Then go to config/routes.rb and configure your routes like so:

Rails.application.routes.draw do
  root to: "calls#root"
  resources :calls, only: :create
  mount ActionCable.server, at: '/cable'
end

If you want to know more about ActionCable I would give the docs a read: https://edgeguides.rubyonrails.org/action_cable_overview.html. But Action Cable will not be the focus of this tutorial.

Next we need to configure our webpack and our frontend file structure. Build out a file structure like so:

frontend/
├── App.jsx
└── components
    ├── VideoCall.jsx
    └── video_util.js

Now we need to use NPM to install React and Webpack. Webpack will package our JS files giving us helpful errors, and React is the cornerstone of our front end. Run this in terminal:

npm init -y
npm install webpack webpack-cli react react-dom 
    @babel/core @babel/preset-react @babel/preset-env babel-loader

Create a webpack.config.js file in the root directory and structure it like this:

var path = require('path');
module.exports = { 
  entry: "./frontend/App.jsx",
  output: {
    path: path.resolve(__dirname, 'app', 'assets', 'javascripts'),
    filename: "bundle.js"
  },
  module: {
  rules: [{
   test: [/\.jsx?$/],
   exclude: /(node_modules)/,
   use: {
         loader: 'babel-loader',
         query: { presets: ['@babel/env', '@babel/react'] }
        },
   }]
  },
  devtool: 'eval-source-map',
  resolve: { extensions: ['.js', '.jsx', '*']}
};

Finally put this script in your package.json file: under the scripts tag:

"webpack": "webpack --mode=development --watch"

Then in terminal:

bundle exec rails s
#in another terminal window
npm run webpack

Make sure that your root.html is rendering at localhost:3000 before moving on.

Part 1: Action Cable and Rails Controllers

Lets get into the fun stuff! We need to build our Controller and a Channel for our calls.

First go to app/assets/javascripts/channels/call.coffee and comment out everything. We’re not going to use any of the default coffeescript that rails provides. We’ll write our own.

Next, we are going to set up our app/channels/call_channel.rb. The actual streaming will not be done through this channel, but we will need it for signaling. (more on that later). For now all we need is a way for different clients to communicate data to one another. Action Cable is great because it allows us to transmit data without the traditional request response cycle of HTML.

All we need to do is stream from a channel, as you can see below:

class CallChannel < ApplicationCable::Channeldef subscribed
     stream_from 'call_channel'end
     def unsubscribed; end
end

Congratulations! Your CallChannel is finished! Users that are subscribed to this channel will now receive all data that is being broadcast to the CallChannel. Now onto app/controllers/calls_controller.rb. This is where the actual broadcasting to the channel will take place, allowing our clients to exchange data objects back and forth so they can establish a working connection and pass their media along using either a TURN server or a peer2peer connection. In order to do that, we first need a function that will parse our data objects:

class CallsController < ApplicationControllerdef create
    head :no_content
    ActionCable.server.broadcast("call_channel", call_params)
  end
  private
  
  def call_params
    params.permit(:call, :type, :from, :to, :sdp)
  end
end

WOAH! Thats a lot of params were permitting! Let’s break those down.

type: The type of signal being broadcast; this will be one of three strings: “JOIN_CALL”, “EXCHANGE”, or “LEAVE_CALL”.
from & to: This will be a number that corresponds to a user. Each data object that we broadcast should always have a from so we can see keep track of who is sending what.
sdp: So this is a little more complex. The SDP (or session description protocol) is serialized JSON that will contain the users IP, media stream information, and a ton of other information. They will generally have a type of answer, offer, or be a candidate. If you want to learn more, I have placed some links at the bottom that I highly recommend reading if you want to learn more about this.
call: This is the header our data is going to be under, so we must permit it.

Finally lets change our app/controllers/application_controller.rb:

class ApplicationController < ActionController::Base
   protect_from_forgery unless: -> { request.format.json? }
end

Short and sweet, but if you miss this step you won’t be able to broadcast through our calls_controller. After this we’re going to say goodbye to ruby and jump into building our frontend.

Part 2: React and WebRTC:

First, let’s build out our App.jsx file. Nothing fancy here, just mount your react component on the DOM once content has been loaded.

import React from 'react';
import VideoCall from './components/VideoCall';
import ReactDOM from 'react-dom';
document.addEventListener("DOMContentLoaded", () => {
    const root = document.getElementById("root");
    ReactDOM.render(<VideoCall />, root)
});

By now you’re probably wondering “What is WebRTC?” and that’s a great question that can be answered in hundreds of pages, or a couple of sentences. The concise answer is that WebRTC is a bunch of native javascript functions that allow two or more users to connect and stream media directly to one another. Personally I love it, there’s no fiddling around with an external library and the documentation is pretty extensive (and rapidly evolving). I’ll go more into detail over different aspects of WebRTC where needed.

But enough about that for now, let’s get to coding. Start off by making a video_util.js file. In this file we’re going to define and export some methods and constants that will be used inside our react component.

export const JOIN_CALL = "JOIN_CALL";
export const EXCHANGE = "EXCHANGE";
export const LEAVE_CALL = "LEAVE_CALL";
export const ice = { iceServers: [
   { 
      urls: "stun:stun2.l.google.com:19302" 
   }
]};
export const broadcastData = data  ?=> {
   fetch("calls", {
                    method: "POST",
                    body: JSON.stringify(data),
                    headers: {"content-type": "application/json"}
                  }
   );
};

Lets unpack what we just wrote and how it’s going to be used.

The first three lines of code are going to be the types that we are going to switch over. We export them as a constant like this to ensure we won’t fail silently with a typo.

Next we are exporting an object, containing an array, containing an object, containing a string. What’s that all about? When using WebRTC you need to incorporate STUN and/or TURN servers depending on your needs. Each client is going to ping off of the STUN server we provide here in order to get back their public IP address as well as any ports they may be using. From there they can negotiate with other clients to establish a peer to peer connection and stream their media directly to one another. A TURN is similar except instead of streaming directly peer to peer, each client will stream to the TURN server. This is useful if you want to add an extra layer of authentication (TURN servers can require a username and password unlike stun), or if your users may be behind a firewall that blocks P2P connections (looking at you, App Academy). The STUN server I have listed here is one of many free STUN servers provided by Google, but if you find it not working for you there are many other free ones out there.

TURN servers are the optional middlemen for our P2P connections.

Finally we have our broadcastData function. This will take in a JSON object and send it to our calls controller, which in turn, broadcasts it to the channel where all users will receive it. This function is the cornerstone for the signaling system that we are about to implement.

Now on to the actual React component. Open your VideoCall.jsx file that you created previously, and import everything from the util file we just made along with react.

import React from 'react';
import { broadcastData, JOIN_CALL, LEAVE_CALL, EXCHANGE, ice } from './video_util.js';

It’s time now to actually structure our React component. We’re going to start with the skeleton, and then go method by method.

class VideoCall extends React.Component{
  constructor(props){
    super(props);
    this.pcPeers = {};
    this.userId = Math.floor(Math.random() * 10000);
    this.joinCall = this.joinCall.bind(this);
    this.leaveCall = this.leaveCall.bind(this);
  }
  componentDidMount(){
  }
  join(data){
  }
  joinCall(){
  }
  createPC(userId, offerBool){
  }
  exchange(data){
  }
  
  leaveCall(){
  }
  removeUser(data){
  }
  render(){
    return(
    <div className="VideoCall">
      <div id="remote-video-container"></div>
      <video id="local-video" autoPlay></video>
      <button onClick={this.joinCallthis)}>Join Call</button>
      <button onClick={this.leaveCall}>Leave Call</button>
    </div>)
  }
}
export default VideoCall;

What we’ve done here is set up the skeleton for our react component. Now let’s go method by method and set up the actual call.

Part 3: Signaling

One thing that both WebRTC and rails have in common is that they’ll take care of a lot of the messy stuff for you under the hood. However, neither of them will take care of signaling for us, so we’re going to have to build that ourselves.

Signaling is the process of sending data to other users in order to facilitate a connection. This involves sending data to other users whenever we join the call, whenever we leave the call, and whenever data is exchanged back and forth. These “Signals” are how our clients are going to know that it is time to create a new connection and begin streaming media to another client.

A good visualization of the signaling process.

Let’s start with getting the users video and rendering that on the screen.

componentDidMount(){
  this.remoteVideoContainer = 
     document.getElementById("remote-video-container")
  navigator.mediaDevices.getUserMedia({audio: false, video: true})
     .then(stream => {
         this.localStream = stream;
         document.getElementById("local-video").srcObject = stream;
     }).catch(error => { console.log(error)});
}

Quick Rundown of what’s going on in our componentDidMount: First we ask the user for access to their computers camera, which returns a promise that takes a media stream. We then take that media stream, save it to a local variable, and set our local-video element’s SRC to it. Slight tweaks can be made here to get video as well as audio, but for now let’s just get video working. Load up your rails server and get your local video to display before continuing on.

Next up: joining the call and connecting to our channel

joinCall(e){
  App.cable.subscriptions.create(
    { channel: "CallChannel" },
    { 
      connected: () => {
        broadcastData({ type: JOIN_CALL, from: this.userId});
      },
      received: data  ?=> {
        console.log("RECEIVED: ", data);
        if (data.from === this.userId) return;
        switch(data.type){
          case JOIN_CALL:
            return this.join(data);
          case EXCHANGE:
            if (data.to !== this.userId) return;
            return this.exchange(data);
          case LEAVE_CALL:
            return this.removeUser(data);
          default:
            return;
        }
      }
    });
}

So here we’re defining those functions that we commented out in the call.coffee file earlier. Upon joining the video call channel, we want to let everyone else in the channel know that we have just joined so they can send us their video stream and IPs. Then we define what our client should do when they receive data from the channel. We need to be sure that the data we received was sent by someone else. We don’t want a user to be able to call themselves now do we?

Next, we’re going to define the this.join(data) function. This function should take care of everything that needs to happen whenever a new user joins the call, mainly creating a new RTCPeerConnection and sending them our initial offer. The code should look like this:

join(data){ this.createPC(data.from, true) }

Not that crazy, right? Well we next should define this.createPC, and we’re going to use those ICE servers that we imported earlier to create a new RTCPeerConnection.

createPC(userId, offerBool){
  const pc = new RTCPeerConnection(ice);
  this.pcPeers[userId] = pc;
  this.localStream.getTracks()
      .forEach(track => pc.addTrack(track, this.localStream));
  if (offerBool) {
    pc.createOffer().then(offer => {
      pc.setLocalDescription(offer).then(() => {
       setTimeout( () => {
        broadcastData({
          type: EXCHANGE,
          from: this.userId,
          to: userId,
          sdp: JSON.stringify(pc.localDescription),
        });
       }, 0);
      });
    });
   }
  pc.onicecandidate = (e) => {};
  pc.ontrack = (e) => {}; 
  pc.oniceconnectionstatechange = (e) => {};
  return pc;
}

WOAH MAN! There’s a lot of jargon here that you may not understand, so here’s a brief description of various terms:

RTCPeerConnection: Represents a WebRTC connection between two users: the local user and a remote peer. It has a bunch of methods that connect, monitor, and close connections between the two peers. This is where the magic happens.

Offer: an SDP offer is the initial package sent out for the purpose of starting a new WebRTC connection to a remote peer. The SDP offer includes information about any streams already attached as well as any candidates already gathered by the ICE agent, for the purpose of being sent over the signaling channel to a potential peer to request a connection. This is what we send when we see another user has joined the call. That user will respond in a similar fashion with an answer.

Local/Remote descriptions: These descriptions specify the properties of the remote/local end of the connection, which for our purpose will contain the users media.

So why are we wrapping that broadcast data in a setTimeout? It’s important for our data to arrive in a very specific order, and because of this we need to ensure that this data is sent out after our synchronous code has resolved.

With all that being said, let’s define these “event listeners” for our RTC peer connection.

/* These Functions are within the CreatePC function */
pc.onicecandidate = (e) => {
  broadcastData({
    type: EXCHANGE,
    from: this.userId,
    to: userId,
    sdp: JSON.stringify(e.candidate)
  })
}
pc.ontrack = (e) => {
  const remoteVid = document.createElement("video");
  remoteVid.id = `remoteVideoContainer+${userId}`;
  remoteVid.autoplay = "autoplay";
  remoteVid.srcObject = e.streams[0];
  this.remoteVideoContainer.appendChild(remoteVid);
}
pc.oniceconnectionstatechange = (e) => {
  if (pc.iceConnectionState === 'disconnected'){
    broadcastData({ type: LEAVE_CALL, from: userId });
  }
}

Here, we are defining what we want to happen whenever changes happen to our RTCPeerConnections, Again this should all be inside our CreatePC, method, before we return the new PeerConnection. So what exactly are these changes? The first method is a part of the process of negotiating a connection between peers, noting what the ice candidate is, and sending it over the action cable channel. The second method takes care of when we have the other users media by actually creating a video element and using the remote users media as its source, and the third method lets everyone else in the call know that we have disconnected through the action cable connection.

Now we need to write out that negotiation process in our exchange(data) method:

exchange(data){
  let pc;
  if(this.pcPeers[data.from]){
    pc = this.pcPeers[data.from];
  } else{
    pc = this.createPC(data.from, false);
  }
  if (data.candidate){
    let candidate = JSON.parse(data.candidate)
    pc.addIceCandidate(new RTCIceCandidate(candidate))
  }
  if(data.sdp){
    const sdp = JSON.parse(data.sdp);
    if(sdp && !sdp.candidate){
       pc.setRemoteDescription(sdp).then( () =>{
       if (sdp.type === 'offer'){
          pc.createAnswer().then(answer => {
             pc.setLocalDescription(answer)
             .then( () => {
                broadcastData({
                   type: EXCHANGE,
                   from: this.userId,
                   to: data.from,
                   sdp: JSON.stringify(pc.localDescription)
                });
             });
          });
       }
       });
    }
  } 
}

The main thing that happens here is the exchanging of ice candidates. Now what is an ice candidate? Candidates are the potential avenues through which our clients can stream their data. Clients will propose their most optimal connections first, then go through less optimal ones.

The other important thing to note is the creation and broadcasting of an answer. Answers are the one-off responses to the initial offers that were sent out when the user joined the call.

At this point you should be able to spin up your rails server in one window, and in a Chrome incognito window, go to localhost:3000 on both and have a video chat with yourself to test. Feel free to add console logs so you can see that the stream data is in fact different between the two browser windows.

Now we have to actually handle leaving a call.

leaveCall(e){
   const pcKeys = Object.keys(this.pcPeers);
   for (let i = 0; i < pcKeys.length; i++) {
      this.pcPeers[pcKeys[i]].close();
   }
   this.pcPeers = {};    
   this.localVideo.srcObject.getTracks()
    .forEach(function (track) { track.stop(); })
   
   this.localVideo.srcObject = null;
   App.cable.subscriptions.subscriptions = [];
   this.remoteVideoContainer.innerHTML = "";
   broadcastData({ type: LEAVE_CALL, from: this.userId });
}

By now this code should be pretty parseable. All we’re doing is closing our peer-to-peer connections, stopping our video feed, emptying our Action Cable subscriptions, and, finally, broadcasting to other users that we have left the call. So let’s take care of what happens when someone else leaves the call

removeUser(data){
 let video =          document.getElementById(`remoteVideoContainer+${data.from}`);
  video && video.remove();
  let peers = this.pcPeers
  delete peers[data.from]
}

When someone else hits the leave call button, you will remove their video container from the DOM, and close the peer-to-peer connection on your end, stopping that flow of data.

And there you have it! A working VideoChat React component that works in Chrome! Feel free to incorporate this into your apps or personal projects. I think that it’s important to note that this tutorial is not meant to be the be all end all tutorial on WebRTC. The goal was to provide a boilerplate react component that works that you can then tinker with.

If you have any questions, feel free to leave a comment or email me at [email protected].

Important links that I think are worth a read if you want to deepen your understanding:

WebRTC Signaling Concepts | Muaz Khan

WebRTC signaling process is based on new standard; JSEP: JavaScript Session Establishment Protocol. JSEP is a…www.webrtc-experiment.com

WebRTC connectivity

This article describes how the various WebRTC-related protocols interact with one another in order to create a…developer.mozilla.org

Anatomy of a WebRTC SDP - webrtcHacks

Behold the wonders and perils of a Session Description Protocol (SDP) generated by Chrome for WebRTC! See the source…webrtchacks.com

All my diagrams are from here:

Scaling WebRTC Based Applications — DZone Web Dev

A discussion of the various ways web developers can scale their WebRTC-based applications, such as using peer-to-peer…dzone.com