Editor’s note: This article was updated on 12 May 2022 to include information relevant to the most recent features of WebRTC and WebSocket.
WebSocket is a protocol that enables real-time communication between client applications (for example, browsers, native platforms, etc.) and a WebSocket server. It utilizes a single and open TCP connection (either encrypted or unencrypted) to handle real-time data transfer between these two mediums.
The specification defines distinct URI schemes: ws
(WebSocket) for unencrypted and wss
(WebSocket Secure) for encrypted connections. Therefore, the Websockets protocol, enables a two-way client-server model, where there is seamless transfer of media types like video/audio and so on.
In this article, we’ll explore the WebSocket protocol and review how to set up a basic WebSocket server with the ws WebSocket library in Node.js.
To begin, let’s explore WebRTC, a protocol available on all modern browser clients and on native Android/iOS platforms, for real-time bi-directional communication.
Jump ahead:
WebRTC, which stands for Web Real-Time Communication, is a protocol that provides a set of rules for bidirectional and secure real-time, peer-to-peer communication for the web. With WebRTC, web applications or other WebRTC agents can send video, audio, and other kinds of media types among peers leveraging simple web APIs.
WebRTC relies on a bunch of other protocols to achieve its purpose of creating a
communication channel, and then transferring or exchanging data and/or media types. To coordinate communication, WebRTC clients need some sort of “signaling server” in between, for exchanging metadata information.
One of the most common ways of providing signaling is by making use of Socket.IO– and ws-based Node.js based servers for sharing session descriptions, media information, and data in a real-time manner for WebRTC clients, making them complementary technologies.
Note: Since different applications may prefer to use different signaling protocols or services, it is not implemented by the WebRTC open standard. This helps in ensuring utmost compatibility with other tools in the ecosystem.
A WebRTC agent knows how to create a connection with peers. Signaling triggers this initial attempt, which eventually makes the call between both agents possible. Agents make use of an offer/answer model: an offer is made by an agent to start the call, and another agent responds to the call and checks for compatibility regarding the media description offered.
On a high level, the WebRTC protocol works in four stages. The communication happens in an orderly manner, where one stage must be complete before the next stage can commence. These four stages include:
This begins the process of identifying two WebRTC agents that intend to communicate and exchange data. When peers eventually connect and can communicate, signaling makes use of another protocol under the hood, SDP.
The session description protocol (a plaintext protocol) is useful for exchanging media sections in key-value pairs. With it, we can share state between two or more intending connecting peers.
Note: A shared state can provide all the needed parameters to establish a connection among peers.
After signaling, WebRTC agents need to achieve bidirectional, peer-to-peer communication. Although establishing a connection could be difficult due to a number of reasons like different IP versions, network location, or protocols, WebRTC connections offer better options when compared to traditional web/server clients communication with HTTP polling. With WebRTC, connections have reduced bandwidth, lower latency, and are better secure.
Note: WebRTC also makes use of ICE protocol (Interactive Connectivity Establishment) to connect two agents. ICE protocol tries to find the best way to communicate between two ICE agents. More details can be found here.
Every WebRTC connection is encrypted and authenticated. Under the hood, it makes use of DTLS and SRTP protocols to enable seamless and secure communication across the data layer. DTLS, similar to TLS, allows for session negotiation or handshake and allows for secure data exchange among peers. On the other hand, SRTP comes in handy for exchanging media information.
With the WebRTC protocol, we can easily send and receive an unlimited amount of audio and video streams. It relies on two pre-existing protocols: RTP and RTCP. RTP protocol carries media information, allowing real-time delivery of video streams. RTCP protocol communicates or synchronizes metadata about the call.
For a seamless and successful communication experience, the two communicating peers must share a pre-defined codec agreed upon by both parties, before the media information can be shared. Again, as standard practice, the protocol is independent of a particular codec, as there are many options.
Note: For most WebRTC applications, there is no direct socket connection between the clients (unless they reside on the same local network). A common way to resolve this sort of issue is by using a TURN server. The term stands for Traversal Using Relay around NAT, and it is a protocol for relaying network traffic. NAT mapping, with the help of STUN and TURN protocols, allows two peers in completely different subnets to communicate.
WebRTC is useful for building real-time applications on the web and on mobile platforms. It has some of its most common use cases listed below:
WebRTC mainly comprises three operations: fetching or acquiring user media from a camera/microphone (both audio and video); communicating this media content over a channel; and finally, sending messages over the channel.
Now, let’s take a look at the summarised description of each flow.
getUserMedia
)This API lets us obtain access to any hardware source of media data. The getUserMedia()
method, with the user’s permission via a prompt to allow access, activates a camera and/or a microphone on the system and provides a [MediaStream]
containing a video track and/or an audio track with the desired input.
Note: The
Navigator.mediaDevices
read-only property returns a[MediaDevices]
object/interface, which provides access to connected media input devices like cameras and microphones, as well as screen sharing.
The format is shown below:
const promise = navigator.mediaDevices.getUserMedia(constraints);
The constraints
parameter is a MediaStreamConstraints
object with two members: video
and audio
, describing the media types requested. It also controls the contents of the MediaStream.
For example, we can set a constraint to open the camera with minWidth
and minHeight
capabilities:
'video': { 'width': {'min': minWidth}, 'height': {'min': minHeight} }
Or we can set echo cancellation on the microphone:
'audio': {'echoCancellation': true},
So, in essence, we can generally declare a constraints
variable, like so:
const constraints = { 'video': true, 'audio': true }
Finally, let’s see a sample of how we can apply getUserMedia()
to trigger a permissions request to a user’s browser:
const openMediaDevices = async (constraints) => { return await navigator.mediaDevices.getUserMedia(constraints); } try { const stream = openMediaDevices({'video':true,'audio':true}); console.log('Got MediaStream:', stream); } catch(error) { console.error('Error accessing media devices.', error); }
Other methods available in the Media Streams API include:
enumerateDevices()
getSupportedConstraints()
getDisplayedMedia()
RTCPeerConnection
RTCPeerConnection
interfaceThe RTCPeerConnection
interface represents a WebRTC connection between a local computer and a remote peer. It provides methods to connect to a remote peer, maintain and monitor the connection, and close the connection once it’s no longer needed.
Once an RTCPeerConnection
is made to a remote peer, it is possible to stream audio and video content between them. At this layer, we can connect the stream we receive from the getUserMedia()
method to the RTCPeerConnection
.
RTCPeerConnection
methods include:
addIceCandidate()
peerIdentity
signalingState
setLocalDescription()
setRemoteDescription()
Note: A media stream should include at least one media track which must be added to the
RTCPeerConnection
when we intend to transmit media to the remote peer.
RTCDataChannel
interfaceThe [RTCDataChannel]
interface represents a network channel that can be used for bidirectional peer-to-peer transfer of arbitrary data. To create a data channel and ask a remote peer to join, we can call the [createDataChannel()]
method of [RTCPeerConnection]
. An example of how to do so is shown below:
const peerConnection = new RTCPeerConnection(configuration); const dataChannel = peerConnection.createDataChannel();
Methods include:
close()
: the RTCDataChannel.close()
method closes the RTCDataChannel; either peer can call this method to initiate closure of the channel of communicationsend()
: the RTCDataChannel.send()
method sends data to the remote peer across the data channelSeveral open-source implementations of the WebRTC protocol based on the sets of APIs exposed can be found here. It contains a repository for various WebRTC experiments. For example, a live demo of the getDisplayMedia()
usage can be found here.
To create a WebRTC connection, clients need to be able to transfer messages via WebSocket signaling — a bidirectional socket connection between two endpoints. A full demo implementation of WebSocket over Node.js can be found on GitHub, courtesy of Muaz Khan. For better context, let’s explore some of the important pieces from the server.js
file.
First, we can set up an HTTP server that accepts an object as an argument. This object should contain the secure keys needed for establishing a seamless connection. We also need to specify a callback function to run when we get a connection request as well as a response to return back to the caller:
// HTTPs server var app = require('https').createServer(options, function(request, response) { // accept server requests and handle subsequent responses here });
Next, we can proceed to set up the WebSocket server and listen for when a connection request comes in, like so:
// require websocket and setup server. var WebSocketServer = require('websocket').server; // wait for when a connection request comes in new WebSocketServer({ httpServer: app, autoAcceptConnections: false }).on('request', onRequest); // listen on app port app.listen(process.env.PORT || 9449); //handle exceptions and exit gracefully process.on('unhandledRejection', (reason, promise) => { process.exit(1); });
As we can see from the above snippet, we are listening on the app port for when we receive a WebSocket connection. When we do so (on the trigger of the request
event), we are handling the connection request with the onRequest
callback.
Here’s the content of the onRequest
method:
// callback function to run when we have a successful websocket connection request function onRequest(socket) { // get origin of request var origin = socket.origin + socket.resource; // accept socket origin var websocket = socket.accept(null, origin); // websocket message event for when message is received websocket.on('message', function(message) { if(!message || !websocket) return; if (message.type === 'utf8') { try { // handle JSON serialization of messages onMessage(JSON.parse(message.utf8Data), websocket); } // catch any errors catch(e) {} } }); // websocket event when the connection is closed websocket.on('close', function() { try { // close websocket channels when the connection is closed for whatever reason truncateChannels(websocket); } catch(e) {} }); }
In the above code, when a message comes in the specified format, we handle it via the onMessage
callback function, which is run when the message
event is triggered.
Here are details of the callback method:
// callback to run when the message event is fired function onMessage(message, websocket) { if(!message || !websocket) return; try { if (message.checkPresence) { checkPresence(message, websocket); } else if (message.open) { onOpen(message, websocket); } else { sendMessage(message, websocket); } } catch(e) {} }
Note: Details on the other methods used above, like
sendMessage
andcheckPresence
, as well as the full implementation of the demo WebSocket server can be found on GitHub.
Further, to start using the WebSocket library, we need to specify the address of the Node.js server in the WebRTC client. After we’re done, we can make inter-browser WebRTC audio/video calls, where the signaling is handled by the Node.js WebSocket server.
Last, and for the sake of emphasis, here are the three basic methods to take note of in a WebSocket connection:
ws.onopen
: emitted when connectedws.send
: trigger a send event to a WebSocket serverws.onmessage
: event emitted when receiving a messageAs we’ve seen, the WebRTC API includes media capture, encoding and decoding audio and video streams, transport, and session management. While implementations of WebRTC in browsers are still evolving due to varying levels of support for WebRTC features, we can avoid issues with compatibility by making use of the Adapter.js library.
This library uses shims and polyfills to resolve the differences among the WebRTC implementations across its various supporting environments. We can add it to the index.html
file with script attributes, like so:
<script src="https://webrtc.github.io/adapter/adapter-latest.js"></script>
We can find a collection of small samples demonstrating various parts of the WebRTC APIs here on this link. It contains implementation details and demos for the major WebRTC APIs, including getUserMedia()
, RTCPeerConnection()
, and RTCDataChannel()
.
Finally, you can find more details about web experiments on websocket-over-nodejs and socketio-over-nodejs on GitHub.
Deploying a Node-based web app or website is the easy part. Making sure your Node instance continues to serve resources to your app is where things get tougher. If you’re interested in ensuring requests to the backend or third-party services are successful, try LogRocket.
LogRocket is like a DVR for web and mobile apps, recording literally everything that happens while a user interacts with your app. Instead of guessing why problems happen, you can aggregate and report on problematic network requests to quickly understand the root cause.
LogRocket instruments your app to record baseline performance timings such as page load time, time to first byte, slow network requests, and also logs Redux, NgRx, and Vuex actions/state. Start monitoring for free.
Would you be interested in joining LogRocket's developer community?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up nowuseState
useState
can effectively replace ref
in many scenarios and prevent Nuxt hydration mismatches that can lead to unexpected behavior and errors.
Explore the evolution of list components in React Native, from `ScrollView`, `FlatList`, `SectionList`, to the recent `FlashList`.
Explore the benefits of building your own AI agent from scratch using Langbase, BaseUI, and Open AI, in a demo Next.js project.
Demand for faster UI development is skyrocketing. Explore how to use Shadcn and Framer AI to quickly create UI components.
2 Replies to "WebRTC signaling with WebSocket and Node.js"
Really nice overview of WebRTC.
Thanks! 🙏🏻
Very informative and good one! Thanks