Alexander Nnakwue Software Engineer. React, Node.js, Python, and other developer tools and libraries.

WebRTC over WebSocket in Node.js

7 min read 2079

WebRTC Over WebSocket in Node.js

Introduction

WebSocket is a protocol that enables real-time communication between client applications (e.g., browsers, native platforms, etc.) and a WebSocket server. For full-duplex, real-time communication, the WebSocket protocol is the recommended standard when compared to HTTP due to its lower latency and overhead.

Built on top of TCP, WebSocket supports a two-way communication model, whereby client-server connections are always kept open, enabling seamless transfer of media types like video/audio. The connection can either be encrypted or unencrypted. The specification defines URI schemes: ws (WebSocket) for unencrypted and wss (WebSocket Secure) for encrypted connections.

As we proceed, we will explore the WebSocket protocol and also how to set up a basic WebSocket server with the ws WebSocket library for Node. For now, let’s quickly explore WebRTC, a protocol available on all modern browsers and on native Android and iOS platforms via simple APIs.

Introducing WebRTC

WebRTC, which stands for Web Real-Time Communication, is a protocol that provides a set of rules for bidirectional and secure real-time, peer-to-peer communication for the web. With WebRTC, web applications or other WebRTC agents can send video, audio, and other kinds of media amongst peers.

WebRTC relies on a bunch of other protocols to achieve its purpose of creating a connection or communication channel, and then transferring/exchanging data and/or media types. To coordinate communication, WebRTC clients need some sort of “signaling server” in between, which is necessary for exchanging metadata info.

Socket.IO– and ws-based Node.js servers offer an alternative for providing signaling in a permanent, real-time manner for WebRTC clients to share session descriptions and media information and actually exchange data. Therefore, they can both be used as complementary technologies.

Note: In other to maximize compatibility with established technologies, signaling is not implemented by the WebRTC open standard. As we know, different applications may prefer to use different signaling protocols or services, therefore validating the need not to include it in the core.

How the WebRTC protocol works

A WebRTC agent knows how to create a connection with peers. Signaling triggers this initial attempt, which eventually makes the call between both agents possible. Agents make use of an offer/answer model: an offer is made by an agent to start the call, and another answers the call for compatibility checks with the media description offered.

On a high level, the WebRTC protocol works in about four steps. For these stages, the communication happens in a dependent order, where one stage must be complete before the next stage can commence. These four stages include:

1.) Signaling

This begins the process of identifying two WebRTC agents that intend to communicate and exchange data. When peers eventually connect and can communicate, signaling makes use of another protocol under the hood, SDP.

The session description protocol (a plaintext protocol) is useful for exchanging media sections in key-value pairs. With it, we can share state between two or more intending connecting peers.

We made a custom demo for .
No really. Click here to check it out.

Note: A shared state can provide all the needed parameters to establish a connection amongst peers.

2.) Connecting

After signaling, WebRTC agents need to achieve bidirectional, peer-to-peer communication. Although establishing a connection could be difficult due to a number of reasons like IP versions, network location, or protocols, WebRTC connections offer better options when compared to traditional web/server clients. These include reduced bandwidth, lower latency, and better security.

Note: WebRTC also makes use of ICE (Interactive Connectivity Establishment) to connect two agents. ICE is a protocol that tries to find the best way to communicate between two ICE agents. More details can be found here.

3.) Securing

Every WebRTC connection is encrypted and authenticated. It makes use of DTLS and SRTP protocols to enable a seamless and secure communication across the data layer. DTLS, similar to TLS, allows us to negotiate a session and then exchange data securely between two peers. On the other hand, SRTP is designed for exchanging media.

4.) Communication

WebRTC allows us to send and receive an unlimited amount of audio and video streams. The protocol is independent of a particular codec, as there are options.

It relies on two pre-existing protocols: RTP and RTCP. RTP is the protocol that carries the media. It was designed to allow real-time delivery of video. RTCP is the protocol that communicates metadata about the call.

Note: For most WebRTC applications, there is no direct socket connection between the clients (unless they reside on the same local network). A common way to resolve this sort of issue is by using a TURN server.

The term stands for Traversal Using Relay around NAT, and it is a protocol for relaying network traffic. NAT mapping, with the help of STUN and TURN protocols, allows two peers in completely different subnets to communicate.

Use cases for WebRTC

Since WebRTC is generic for real-time applications on the web and on mobile platforms, some of its most common use cases include:

  • Video and text chatting
  • Analytics
  • Social networking
  • Screen-sharing technologies
  • Conferencing (audio/video)
  • Live broadcasting
  • File transfer
  • Elearning
  • Multiplayer online games
  • And so on…

WebRTC JavaScript APIs

WebRTC mainly comprises three operations: fetching or acquiring user media from a camera/microphone (both audio and video); communicating this media content over a channel; and then, finally, sending messages over the channel. Now, let’s take a look at the summarized description of each operation type.

Media Streams API (getUserMedia)

This API lets us obtain access to any hardware source of media data. The getUserMedia() method, with the user’s permission through a prompt, activates a camera and/or a microphone on the system and provides a [MediaStream](https://developer.mozilla.org/en-US/docs/Web/API/MediaStream) containing a video track and/or an audio track with the desired input.

Note: The Navigator.mediaDevices read-only property returns a MediaDevices object/interface, which provides access to connected media input devices like cameras and microphones, as well as screen sharing.

The format is shown below:

const promise = navigator.mediaDevices.getUserMedia(constraints);

The constraints parameter is a MediaStreamConstraints object with two members: video and audio, describing the media types requested. It also controls the contents of the MediaStream.

For example, we can set a constraint to open the camera with minWidth and minHeight capabilities:

  'video': {
    'width':  {'min': minWidth},
    'height': {'min': minHeight} 
  }

Or we can set echo cancellation on the microphone:

'audio': {'echoCancellation': true},

So, in essence, we can generally declare a constraints variable as:

const constraints = {
    'video': true,
    'audio': true
}

Finally, lets see a sample of how we can apply getUserMedia() to trigger a permissions request to a user’s browser:

const openMediaDevices = async (constraints) => {
    return await navigator.mediaDevices.getUserMedia(constraints);
}

try {
    const stream = openMediaDevices({'video':true,'audio':true});
    console.log('Got MediaStream:', stream);
} catch(error) {
    console.error('Error accessing media devices.', error);
}

Other methods available in the Media Streams API include:

  1. enumerateDevices()
  2. getSupportedConstraints()
  3. getDisplayMedia()

RTCPeerConnection

The RTCPeerConnection interface represents a WebRTC connection between a local computer and a remote peer. It provides methods to connect to a remote peer, maintain and monitor the connection, and close the connection once it’s no longer needed.

Once an RTCPeerConnection is made to a remote peer, it is possible to stream audio and video between them. At this layer, we can connect the stream we receive from getUserMedia() to the RTCPeerConnection.

RTCPeerConnection methods include:

Note: A media stream should consist of at least one media track, which we must have added to the RTCPeerConnection when we intend to transmit the media to the remote peer.

RTCDataChannel

The RTCDataChannel interface represents a network channel that can be used for bidirectional peer-to-peer transfer of arbitrary data. To create a data channel and ask a remote peer to join, we can call the createDataChannel() method of RTCPeerConnection. An example is shown below:

const peerConnection = new RTCPeerConnection(configuration);
const dataChannel = peerConnection.createDataChannel();

Methods include:

  • close() – the RTCDataChannel.close() method closes the RTCDataChannel. Either peer can call this method to initiate closure of the channel
  • send() – the send() method of the RTCDataChannel interface sends data to the remote peer, across the data channel

Implementation of the WebRTC protocol based on the sets of APIs exposed can be found here. It contains a repository for various WebRTC experiments. For example, a live demo of the getDisplayMedia() usage can be found here.

Sample Node.js WebSocket-based server

To create a WebRTC connection, clients need to be able to transfer messages via WebSocket signaling — a bidirectional socket connection between two endpoints. A full demo implementation of WebSocket over Node.js can be found on GitHub, courtesy of Muaz Khan. For better context, let’s explore some of the important pieces from the server.js file.

Firstly, we can set up an HTTP server that accepts an object as an argument. This object should contain the secure keys needed for establishing a seamless connection. We also need to specify a callback function to run when we get a connection request as well as a response to return back to the caller:

// HTTPs server 
var app = require('https').createServer(options, function(request, response) {
  // accept server requests and handle subsequent responses here 
});

After this, we can proceed to set up the WebSocket server as shown below and listen for when a connection request comes in:

// require websocket and setup server.
var WebSocketServer = require('websocket').server;

// wait for when a connection request comes in 
new WebSocketServer({
    httpServer: app, 
    autoAcceptConnections: false 
}).on('request', onRequest);

// listen on app port 
app.listen(process.env.PORT || 9449);

//handle exceptions and exit gracefully 
process.on('unhandledRejection', (reason, promise) => {
  process.exit(1);
});

As we can see from the above snippet, we are listening on the app port for when we receive a WebSocket connection. When we do so (on the trigger of the request event), we are handling the connection request with the onRequest callback. The content of the onRequest method is shown below:

// callback function to run when we have a successful websocket connection request
function onRequest(socket) {

    // get origin of request 
    var origin = socket.origin + socket.resource;

    // accept socket origin 
    var websocket = socket.accept(null, origin);

    // websocket message event for when message is received
    websocket.on('message', function(message) {
        if(!message || !websocket) return;

        if (message.type === 'utf8') {
            try {
                // handle JSON serialization of messages 
                onMessage(JSON.parse(message.utf8Data), websocket);
            }
            // catch any errors 
            catch(e) {}
        }
    });
    // websocket event when the connection is closed 
    websocket.on('close', function() {
        try {
            // close websocket channels when the connection is closed for whatever reason
            truncateChannels(websocket);
        }
        catch(e) {}
    });
}

From the above method, when a message comes in the specified format, we are handle it via the onMessage callback function, which is run when the message event is triggered. Details of the callback method are shown below:

// callback to run when the message event is fired 
function onMessage(message, websocket) {
    if(!message || !websocket) return;

    try {
        if (message.checkPresence) {
            checkPresence(message, websocket);
        }
        else if (message.open) {
            onOpen(message, websocket);
        }
        else {
            sendMessage(message, websocket);
        }
    }
    catch(e) {}
}

Note: Details on the other methods used above, like sendMessage and checkPresence, as well as the full implementation of the demo WebSocket server can be found on GitHub.

Further, to start using the WebSocket library, we need to specify the address of the Node.js server in the WebRTC client. After we are done, we can then make inter-browser WebRTC audio/video calls, where the signaling is handled by the Node.js WebSocket server.

Lastly, and for the sake of emphasis, the three basic methods to take note of in a WebSocket connection include:

  • ws.onopen – emitted when connected
  • ws.send – trigger a send event to a WebSocket server
  • ws.onmessage – event emitted when receiving a message

Conclusion

As we have seen, the WebRTC API includes media capture, encoding and decoding audio and video streams, transport, and, finally, session management. While implementations of WebRTC in browsers are still evolving due to different levels of support for WebRTC features, we can avoid issues with compatibility by making use of the Adapter.js library.

This library uses shims and polyfills to resolve the differences among the WebRTC implementations across various environments supporting it. We can add it to the index.html file with the regular attributes, like so:

<script src="https://webrtc.github.io/adapter/adapter-latest.js"></script>

We can find a collection of small samples demonstrating various parts of the WebRTC APIs here on this link. It contains implementation details and demos for the major WebRTC APIs, including getUserMedia(), RTCPeerConnection(), and RTCDataChannel().

Finally, you can find more details about web experiments on websocket-over-nodejs and socketio-over-nodejs on GitHub.

You come here a lot! We hope you enjoy the LogRocket blog. Could you fill out a survey about what you want us to write about?

    Which of these topics are you most interested in?
    ReactVueAngularNew frameworks
    Do you spend a lot of time reproducing errors in your apps?
    YesNo
    Which, if any, do you think would help you reproduce errors more effectively?
    A solution to see exactly what a user did to trigger an errorProactive monitoring which automatically surfaces issuesHaving a support team triage issues more efficiently
    Thanks! Interested to hear how LogRocket can improve your bug fixing processes? Leave your email:

    200’s only Monitor failed and slow network requests in production

    Deploying a Node-based web app or website is the easy part. Making sure your Node instance continues to serve resources to your app is where things get tougher. If you’re interested in ensuring requests to the backend or third party services are successful, try LogRocket. https://logrocket.com/signup/

    LogRocket is like a DVR for web apps, recording literally everything that happens on your site. Instead of guessing why problems happen, you can aggregate and report on problematic network requests to quickly understand the root cause.

    LogRocket instruments your app to record baseline performance timings such as page load time, time to first byte, slow network requests, and also logs Redux, NgRx, and Vuex actions/state. .
    Alexander Nnakwue Software Engineer. React, Node.js, Python, and other developer tools and libraries.

    2 Replies to “WebRTC over WebSocket in Node.js”

    Leave a Reply