Alexander Nnakwue Software engineer. React, Node.js, Python, and other developer tools and libraries.

WebRTC signaling with WebSocket and Node.js

7 min read 2145

WebRTC Signaling WebSocket Node.js

Editor’s note: This article was updated on 12 May 2022 to include information relevant to the most recent features of WebRTC and WebSocket.

WebSocket is a protocol that enables real-time communication between client applications (for example, browsers, native platforms, etc.) and a WebSocket server. It utilizes a single and open TCP connection (either encrypted or unencrypted) to handle real-time data transfer between these two mediums.

The specification defines distinct URI schemes: ws (WebSocket) for unencrypted and wss (WebSocket Secure) for encrypted connections. Therefore, the Websockets protocol, enables a two-way client-server model, where there is seamless transfer of media types like video/audio and so on.

In this article, we’ll explore the WebSocket protocol and review how to set up a basic WebSocket server with the ws WebSocket library in Node.js.

To begin, let’s explore WebRTC, a protocol available on all modern browser clients and on native Android/iOS platforms, for real-time bi-directional communication.

Jump ahead:

Introducing WebRTC

WebRTC, which stands for Web Real-Time Communication, is a protocol that provides a set of rules for bidirectional and secure real-time, peer-to-peer communication for the web. With WebRTC, web applications or other WebRTC agents can send video, audio, and other kinds of media types among peers leveraging simple web APIs.

WebRTC relies on a bunch of other protocols to achieve its purpose of creating a
communication channel, and then transferring or exchanging data and/or media types. To coordinate communication, WebRTC clients need some sort of “signaling server” in between, for exchanging metadata information.

One of the most common ways of providing signaling is by making use of Socket.IO– and ws-based Node.js based servers for sharing session descriptions, media information, and data in a real-time manner for WebRTC clients, making them complementary technologies.

Note: Since different applications may prefer to use different signaling protocols or services, it is not implemented by the WebRTC open standard. This helps in ensuring utmost compatibility with other tools in the ecosystem.

How the WebRTC protocol works

A WebRTC agent knows how to create a connection with peers. Signaling triggers this initial attempt, which eventually makes the call between both agents possible. Agents make use of an offer/answer model: an offer is made by an agent to start the call, and another agent responds to the call and checks for compatibility regarding the media description offered.



On a high level, the WebRTC protocol works in four stages. The communication happens in an orderly manner, where one stage must be complete before the next stage can commence. These four stages include:

Stage 1: Signaling

This begins the process of identifying two WebRTC agents that intend to communicate and exchange data. When peers eventually connect and can communicate, signaling makes use of another protocol under the hood, SDP.
The session description protocol (a plaintext protocol) is useful for exchanging media sections in key-value pairs. With it, we can share state between two or more intending connecting peers.

Note: A shared state can provide all the needed parameters to establish a connection among peers.

Stage 2: Connecting

After signaling, WebRTC agents need to achieve bidirectional, peer-to-peer communication. Although establishing a connection could be difficult due to a number of reasons like different IP versions, network location, or protocols, WebRTC connections offer better options when compared to traditional web/server clients communication with HTTP polling. With WebRTC, connections have reduced bandwidth, lower latency, and are better secure.

Note: WebRTC also makes use of ICE protocol (Interactive Connectivity Establishment) to connect two agents. ICE protocol tries to find the best way to communicate between two ICE agents. More details can be found here.

Stage 3: Securing

Every WebRTC connection is encrypted and authenticated. Under the hood, it makes use of DTLS and SRTP protocols to enable seamless and secure communication across the data layer. DTLS, similar to TLS, allows for session negotiation or handshake and allows for secure data exchange among peers. On the other hand, SRTP comes in handy for exchanging media information.

Stage 4: Communicating

With the WebRTC protocol, we can easily send and receive an unlimited amount of audio and video streams. It relies on two pre-existing protocols: RTP and RTCP. RTP protocol carries media information, allowing real-time delivery of video streams. RTCP protocol communicates or synchronizes metadata about the call.

For a seamless and successful communication experience, the two communicating peers must share a pre-defined codec agreed upon by both parties, before the media information can be shared. Again, as standard practice, the protocol is independent of a particular codec, as there are many options.

Note: For most WebRTC applications, there is no direct socket connection between the clients (unless they reside on the same local network). A common way to resolve this sort of issue is by using a TURN server. The term stands for Traversal Using Relay around NAT, and it is a protocol for relaying network traffic. NAT mapping, with the help of STUN and TURN protocols, allows two peers in completely different subnets to communicate.

Use cases for WebRTC

WebRTC is useful for building real-time applications on the web and on mobile platforms. It has some of its most common use cases listed below:


More great articles from LogRocket:


  • Video and text chatting
  • Analytics
  • Social networking
  • Screen-sharing technologies
  • Conferencing (audio/video)
  • Live broadcasting
  • File transfer
  • e-learning
  • Multiplayer online games

WebRTC JavaScript APIs

WebRTC mainly comprises three operations: fetching or acquiring user media from a camera/microphone (both audio and video); communicating this media content over a channel; and finally, sending messages over the channel.

Now, let’s take a look at the summarised description of each flow.

Media Streams API (getUserMedia)

This API lets us obtain access to any hardware source of media data. The getUserMedia() method, with the user’s permission via a prompt to allow access, activates a camera and/or a microphone on the system and provides a [MediaStream] containing a video track and/or an audio track with the desired input.

Note: The Navigator.mediaDevices read-only property returns a [MediaDevices]object/interface, which provides access to connected media input devices like cameras and microphones, as well as screen sharing.

The format is shown below:

const promise = navigator.mediaDevices.getUserMedia(constraints);

The constraints parameter is a MediaStreamConstraints object with two members: video and audio, describing the media types requested. It also controls the contents of the MediaStream.
For example, we can set a constraint to open the camera with minWidth and minHeight capabilities:

  'video': {
    'width':  {'min': minWidth},
    'height': {'min': minHeight} 
  }

Or we can set echo cancellation on the microphone:

'audio': {'echoCancellation': true},

So, in essence, we can generally declare a constraints variable, like so:

const constraints = {
    'video': true,
    'audio': true
}

Finally, let’s see a sample of how we can apply getUserMedia() to trigger a permissions request to a user’s browser:

const openMediaDevices = async (constraints) => {
    return await navigator.mediaDevices.getUserMedia(constraints);
}

try {
    const stream = openMediaDevices({'video':true,'audio':true});
    console.log('Got MediaStream:', stream);
} catch(error) {
    console.error('Error accessing media devices.', error);
}

Other methods available in the Media Streams API include:

  • enumerateDevices()
  • getSupportedConstraints()
  • getDisplayedMedia()
  • RTCPeerConnection

RTCPeerConnection interface

The RTCPeerConnection interface represents a WebRTC connection between a local computer and a remote peer. It provides methods to connect to a remote peer, maintain and monitor the connection, and close the connection once it’s no longer needed.

Once an RTCPeerConnection is made to a remote peer, it is possible to stream audio and video content between them. At this layer, we can connect the stream we receive from the getUserMedia() method to the RTCPeerConnection.
RTCPeerConnection methods include:

  • addIceCandidate()
  • peerIdentity
  • signalingState
  • setLocalDescription()
  • setRemoteDescription()

Note: A media stream should include at least one media track which must be added to the RTCPeerConnection when we intend to transmit media to the remote peer.

RTCDataChannel interface

The [RTCDataChannel] interface represents a network channel that can be used for bidirectional peer-to-peer transfer of arbitrary data. To create a data channel and ask a remote peer to join, we can call the [createDataChannel()] method of [RTCPeerConnection]. An example of how to do so is shown below:

const peerConnection = new RTCPeerConnection(configuration);
const dataChannel = peerConnection.createDataChannel();

Methods include:

  • close(): the RTCDataChannel.close() method closes the RTCDataChannel; either peer can call this method to initiate closure of the channel of communication
  • send(): the RTCDataChannel.send() method sends data to the remote peer across the data channel

Several open-source implementations of the WebRTC protocol based on the sets of APIs exposed can be found here. It contains a repository for various WebRTC experiments. For example, a live demo of the getDisplayMedia() usage can be found here.

Sample Node.js WebSocket-based server

To create a WebRTC connection, clients need to be able to transfer messages via WebSocket signaling — a bidirectional socket connection between two endpoints. A full demo implementation of WebSocket over Node.js can be found on GitHub, courtesy of Muaz Khan. For better context, let’s explore some of the important pieces from the server.js file.

First, we can set up an HTTP server that accepts an object as an argument. This object should contain the secure keys needed for establishing a seamless connection. We also need to specify a callback function to run when we get a connection request as well as a response to return back to the caller:

// HTTPs server 
var app = require('https').createServer(options, function(request, response) {
  // accept server requests and handle subsequent responses here 
});

Next, we can proceed to set up the WebSocket server and listen for when a connection request comes in, like so:

// require websocket and setup server.
var WebSocketServer = require('websocket').server;

// wait for when a connection request comes in 
new WebSocketServer({
    httpServer: app, 
    autoAcceptConnections: false 
}).on('request', onRequest);

// listen on app port 
app.listen(process.env.PORT || 9449);

//handle exceptions and exit gracefully 
process.on('unhandledRejection', (reason, promise) => {
  process.exit(1);
});

As we can see from the above snippet, we are listening on the app port for when we receive a WebSocket connection. When we do so (on the trigger of the requestevent), we are handling the connection request with the onRequest callback.

Here’s the content of the onRequest method:

// callback function to run when we have a successful websocket connection request
function onRequest(socket) {

    // get origin of request 
    var origin = socket.origin + socket.resource;

    // accept socket origin 
    var websocket = socket.accept(null, origin);

    // websocket message event for when message is received
    websocket.on('message', function(message) {
        if(!message || !websocket) return;

        if (message.type === 'utf8') {
            try {
                // handle JSON serialization of messages 
                onMessage(JSON.parse(message.utf8Data), websocket);
            }
            // catch any errors 
            catch(e) {}
        }
    });

    // websocket event when the connection is closed 
    websocket.on('close', function() {
        try {
            // close websocket channels when the connection is closed for whatever reason
            truncateChannels(websocket);
        }
        catch(e) {}
    });
}

In the above code, when a message comes in the specified format, we handle it via the onMessage callback function, which is run when the message event is triggered.

Here are details of the callback method:

// callback to run when the message event is fired 
function onMessage(message, websocket) {
    if(!message || !websocket) return;

    try {
        if (message.checkPresence) {
            checkPresence(message, websocket);
        }
        else if (message.open) {
            onOpen(message, websocket);
        }
        else {
            sendMessage(message, websocket);
        }
    }
    catch(e) {}
}

Note: Details on the other methods used above, like sendMessage and checkPresence, as well as the full implementation of the demo WebSocket server can be found on GitHub.

Further, to start using the WebSocket library, we need to specify the address of the Node.js server in the WebRTC client. After we’re done, we can make inter-browser WebRTC audio/video calls, where the signaling is handled by the Node.js WebSocket server.

Last, and for the sake of emphasis, here are the three basic methods to take note of in a WebSocket connection:

  • ws.onopen: emitted when connected
  • ws.send: trigger a send event to a WebSocket server
  • ws.onmessage: event emitted when receiving a message

Conclusion

As we’ve seen, the WebRTC API includes media capture, encoding and decoding audio and video streams, transport, and session management. While implementations of WebRTC in browsers are still evolving due to varying levels of support for WebRTC features, we can avoid issues with compatibility by making use of the Adapter.js library.

This library uses shims and polyfills to resolve the differences among the WebRTC implementations across its various supporting environments. We can add it to the index.html file with script attributes, like so:

<script src="https://webrtc.github.io/adapter/adapter-latest.js"></script>

We can find a collection of small samples demonstrating various parts of the WebRTC APIs here on this link. It contains implementation details and demos for the major WebRTC APIs, including getUserMedia(), RTCPeerConnection(), and RTCDataChannel().

Finally, you can find more details about web experiments on websocket-over-nodejs and socketio-over-nodejs on GitHub.

200’s only Monitor failed and slow network requests in production

Deploying a Node-based web app or website is the easy part. Making sure your Node instance continues to serve resources to your app is where things get tougher. If you’re interested in ensuring requests to the backend or third party services are successful, try LogRocket. https://logrocket.com/signup/

LogRocket is like a DVR for web and mobile apps, recording literally everything that happens while a user interacts with your app. Instead of guessing why problems happen, you can aggregate and report on problematic network requests to quickly understand the root cause.

LogRocket instruments your app to record baseline performance timings such as page load time, time to first byte, slow network requests, and also logs Redux, NgRx, and Vuex actions/state. .
Alexander Nnakwue Software engineer. React, Node.js, Python, and other developer tools and libraries.

2 Replies to “WebRTC signaling with WebSocket and Node.js”

Leave a Reply