WebRTC: Video telephony without a browser plugin

Direct Line

© Lead Image © Yanik Chauvin, Fotolia.com

© Lead Image © Yanik Chauvin, Fotolia.com

Author(s):

The WebRTC protocol converts your web browser into a communications center, supporting video chat over a peer-to-peer connection without the need for helper apps or browser plugins.

People who use video chat and other forms of real-time Internet communication often rely on Skype or similar tools. Web browsers too often depend on Flash or Java plugins for real-time communication. The latest generation of browsers, however, offer a powerful new tool for building real-time communication into scripts and homegrown web applications. WebRTC (Real-Time Web Communication) [1] supplements the new HTML5 standard by bringing native real-time communication to the browser.

WebRTC can handle video chat and similar formats. Communication occurs directly from browser to browser, without the need for an intervening web application. In this article, I show how easy it is to build a homegrown Internet video chat application by integrating WebRTC with the usual collection of web developer tools: HTML, JavaScript, CSS, and Node.js.

WebRTC is jointly promoted by browser vendors such as Google, Mozilla, and Opera. (Microsoft considers WebRTC to be too complicated and has presented UC-RTC [2] as its own design for real-time communication in browsers.) Although the WebRTC specification is not yet complete, the Google Chrome and Mozilla Firefox 22 web browsers already largely support it. WebRTC is a free standard described in a set of IETF documents [3], and W3C has already accepted a draft for a programming interface [4] for WebRTC in the browser.

You'll find a demo video that describes some of WebRTC's capabilities on YouTube [5]. The demo, which comes from the Mozilla project, shows a video call from a Firefox browser on a cellphone via the public telephone network.

How It Works

Figure 1 shows the data flow for a WebRTC session. Application and configuration data go their separate ways: The server acts as a router that accepts configuration information (i.e., IP addresses or information about video and audio formats) from one browser and forwards it to the other. WebRTC calls this the signaling process. The HTTP or WebSocket protocol is a useful choice for transferring the configuration data, with JavaScript Object Notation (JSON) providing the structure.

Figure 1: Configuration and application data go separate ways in WebRTC: The configuration data are routed by an intermediate server, whereas the protocol transmits the application data from peer to peer.

After exchanging the IP addresses on the signaling server, the browsers establish a peer-to-peer connection, as shown in Figure 1. The connection is used to transmit the application data directly between browsers via TCP or UDP, saving time and data traffic.

To open the connection, WebRTC works around Network Address Translation (NAT) routers or firewalls through the use of Interactive Connectivity Establishment (ICE) [6]. ICE retrieves usable IP addresses and ports for sending and receiving data over peer-to-peer connections.

ICE first tries to determine the IP address and port using Session Traversal Utilities for NAT (STUN) [7]. To do so, it sends a message to a public STUN server and receives the sender's address in return. However, if the NAT router blocks STUN, the IP address and port are provided by a relay server on the web using Traversal Using Relays around NAT (TURN) [8]. ICE reports the appropriate IP addresses and ports to WebRTC by triggering a onicecandidate event. The data is wrapped, along with the protocol to be used – TCP or UDP – in an ICE candidate-type object.

Sample Application

The following sample application uses WebRTC in the browser to implement a video chat. The browser takes the image from the webcam and the sound from the microphone and passes it to a second browser via WebRTC. The client application in this article uses HTML, CSS, and JavaScript for the implementation. Node.js is used for the signaling server.

The example is initially restricted to the Firefox browser, but you can port it to Chrome. Figure 2 shows the sample application running in Firefox on Ubuntu Linux. The other end is using Firefox on Windows 7.

Figure 2: At a glance: On the left is the image from the local webcam; on the right, the image from the remote webcam.

The getUserMedia() programming interface [9] in the latest versions of Firefox, Chrome, and Opera provides access to the webcam and microphone on the local machine, but it first asks the user's permission in a pop-up. WebRTC combines the video and audio streams to a media stream, as shown in Figure 3, which it can then process. A media stream can contain any number of video and audio tracks; one audio track includes two stereo channels.

Figure 3: The media stream combines video and audio streams for use in HTML video and audio elements, as well as in peer-to-peer connections.

Video and Audio Signals

Listings 1-3 [10] demonstrate how to integrate the local webcam image into an HTML document (Figure 4). The HTML document in Listing 1 references two JavaScript files in the header area (lines 3-4). The core.js file contains functions for several examples in this article, such as mediastream.js. In the body of the HTML document in Listing 1, the video element with the ID local is waiting in line 8 for a connection to a media stream.

Listing 1

HTML with a Video Element

 

Figure 4: Initial success: Firefox playing the image from the local webcam in the browser.

The call to the mozGetUserMedia() JavaScript method in Listing 2 (lines 2-8) grabs the images from the webcam and the sound from the microphone and bundles it into a media stream. The code in lines 4-6 passes in the stream as a parameter to the callback function. The connectStream() function in line 5 (and in Listing 3) plays the media stream in the video element. If a problem occurs, a callback function is called in line 7 to handle errors. Firefox makes the specification of a callback function for this case mandatory.

Listing 2

mediastream.js

 

Listing 3

core.js (Excerpt)

 

Listing 3 shows the JavaScript connectStream() function for the Firefox browser. Line 2 uses the querySelector() method with a CSS selector to select an element from the HTML document – preferably a video element. Line 4 binds the media stream to this element using the Firefox-specific mozSrcObject attribute.

Opening a Peer-to-Peer Connection

The next example goes one step further and transmits the media stream via WebRTC. Before the transfer, the browsers involved exchange session descriptions in Session Description Protocol (SDP) format [11]. The descriptions include information about media streams to be transmitted and the IP addresses, ports, and protocols to use.

Figure 5 shows the steps that an application has to take to open a peer-to-peer connection. Initially, create(), top left in the figure, creates a peer object for the browser on the left, and stream() stores the local media stream in the local peer object. Next, offer() generates a session description, and keepLoc() stores it as a local session description, also in the local peer object. Then, send() transmits the information to the right-hand browser.

Figure 5: Before opening a peer-to-peer connection, the two browsers create session descriptions and exchange them.

When the session description reaches the right-hand browser, keepRem() stores it as information about the other party. Then, answer() generates a local session description for this browser, and keepLoc() stores it as a local session description. Finally, send() transmits it back to the left-hand browser, where the local keepRem() function stores it as a remote session description. This completes the handshake.

WebRTC in the Local Browser

Listings 4 to 6 transfer the media stream via a peer-to-peer connection, as shown in Figure 5. To remove the need for a signaling server in this preliminary study, two video elements just communicate in the Firefox browser on the local machine (Figure 6). Even locally, WebRTC uses a network connection.

Figure 6: Preliminary study for video chat: Two video elements communicating in the same browser window.

Listing 4 shows the HTML document with the example. As in Listing 2, it binds two JavaScript files in the header. Line 9 of the document body contains a video element for outputting the local media stream, whereas line 10 has one for outputting the media stream transmitted via WebRTC.

Listing 4

Local and Transmitted Media Stream

 

Listing 5 shows the JavaScript code of the sample application from the localwebrtc.js file. Querying and configuring the media stream is handled as in Listing 2, but with a different callback function in lines 5 to 19. In Listing 5, line 6, first connects the media stream with the video element. Line 7 generates the local peer object and stores it in the pcLocal variable, and line 8 creates the remote peer object in pcRemote. Line 9 stores the local media stream in the pcLocal peer object.

Listing 5

WebRTC in the Local Browser

 

The asynchronous createOffer() method in line 10 creates the session description for the local context and passes it to the callback function in lines 10 to 17, as in the desc parameter. The setLocalDescription() method stores the content of desc as a local session description in the pcLocal peer object. As the application runs locally, the setRemoteDescription() method saves the content of desc in line 12 as a remote session description for the remote peer object, pcRemote.

The call to the asynchronous createAnswer() method in line 13 creates the local session description for the remote peer object, pcRemote, and passes it to the callback function in lines 13 to 16. Line 14 then stores the session description for the remote peer object, pcRemote, as a local session description, and line 15 stores it as a remote session description for the local peer object, pcLocal.

The onaddstream event for the remote peer object, pcRemote, is triggered in line 18 by calling the onadd() function from Listing 6, which returns a callback function. It combines all incoming media streams with the video element that has the ID remote (Listing 4, line 10). Local communication has now been established.

Listing 6

onadd()

 

WebRTC via the Internet

The typical application scenario for WebRTC on the Internet requires a signaling server. The signaling server forwards the configuration data, as shown in Figure 1. The following example uses the JSON format for this data. Each message corresponds to a JavaScript object with two components: command describes the type of message; data stores the payload data.

The application uses three different types of messages: A message of the offer type (Listing 7) opens a peer-to-peer connection. To acknowledge opening the connection, the remote site sends a message of the answer type in return.

Listing 7

Offer in JSON Format

 

The data fields of the two signals contain the respective local session description. When ICE sends a new candidate, it communicates this by sending an ICE-type message. In this example, data contains the candidate.

Signaling Server

JavaScript is also used for the signaling server, but server-side on Node.js. On Ubuntu Linux, you can install Node.js (currently version 0.8) and its package manager Npm with the

sudo apt-get install nodejs npm

command. The signaling server also requires the Connect modules (version 2.7.2 or later) as well as version 1.0.8 of WebSocket. Because WebSocket depends on node-gyp, you first need to install this:

npm install -g node-gyp

Then, you can install websocket and connect:

sudo npm install connect websocket

Listing 8 shows the code for the signaling server in Node.js. Lines 2 to 4 integrate the required modules. The channels variable (line 5) stores the connected clients in a field.

Listing 8

Signaling Server for Node.js

 

In the next line, the app variable accepts a connect application with the static module added on top. Static tells Node.js to respond to HTTP requests, such as http://localhost:6655/webrtc.html, by returning the webrtc.html file from the ../ directory.

The WebSocket server binds an httpServer object in line 9. As an argument in the call to the createServer() method, the app variable passes the application logic from the connect application to the httpServer object. The listen() method tells the HTTP server to listen on port 6655.

Connection Events

The callback function in lines 12 to 28 of Listing 8 describes the response of the signaling server when setting up a connection via the WebSocket protocol. To do this, the on() method in line 12 binds the callback function from the second argument to the instance of the request event. In line 13, the callback function stores a valid connection in the thisChannel variable; line 14 adds it to the list of existing connections.

Lines 15 to 22 define a callback function for processing an incoming message. In the loop across all connections (lines 16-21), the sendUTF() method forwards the message to all other connections. Lines 23 to 27 remove the client from the list on terminating the connection.

As Figure 7 shows, you can watch the signaling server at work: Thanks to console.log(msg.utf8Data) in line 18, it writes its output to the console.

Figure 7: The signaling server forwards messages from one client to the other as a router.

If you want WebRTC to work over the Internet, you need to send the session descriptions in Listing 5 to the connected browser via the WebSocket protocol and the signaling server, as shown in Figures 1 and 5. Listing 9 shows the HTML document for this example. In contrast to Listing 4, it binds webrtc.js instead of localwebrtc.js.

Listing 9

HTML for WebRTC via the Internet

 

Listing 10 shows the JavaScript code for webrtc.js: Line 8 opens a connection to the WebSocket server and stores it in the channel variable. After opening the connection, the code executes the callback function from lines 9 through 15. Within this function, line 10 creates a new peer object and stores it in the global variable pc. Line 11 uses the addStream() method to add the media stream, stream, from the request in line 4 to the peer object.

Listing 10

webrtc.js: WebRTC via the Internet

 

The next two lines create the callback functions for the onicecandidate and onaddstream events in the connection setup. The event handling for onicecandidate is in line 12, which calls the onice() function in Listing 11. Like the onadd() function in Listing 6, it returns a callback function. When called, the send() function (Listing 13) sends an ICE candidate. Finally, line 14 in Listing 10 uses createOffer() to generate a session description and passes it to the callback function from the call to desc() (Listing 12). In doing so, line 3 of Listing 12 stores the session description in the peer object, pc, and the next line forwards it via the signaling server.

Listing 11

Event Handler

 

Listing 12

Session Description

 

Listing 13

Serializing Objects

 

Receiving and Sending

However, if the browser receives an external message via the WebSocket connection, the callback function in lines 16 to 34 of Listing 10 sees some action. It transforms the incoming JSON message into a JavaScript object that processes the subsequent switch statement, depending on the message type.

Listing 13 shows the send() function, which uses a method with the same name in line 2 from the WebSocket object to send a message. Before that happens, stringify() converts it to a string in JSON format.

Data Channel

WebRTC can do more than just transfer media streams. The data() function in Listing 14 creates a data channel for transmitting text messages that works like a WebSocket. Line 3 uses the createDataChannel() method to open a channel for the current peer-to-peer connection in the peer object, pc. The ondatachannel event in line 5 occurs when a data channel is opened. The config() function in line 11 defines the callback function that is triggered when a text message arrives, and the send() method in line 14 sends messages over the existing connection.

Listing 14

Data Channel

 

Chrome Can Do

The sample application for Firefox can be ported for the Google Chrome browser with little effort. To port the application to Chrome, just change the manufacturer prefix moz to webkit in the examples. The expressions you need to change could just as easily be outsourced into functions that call the appropriate function, including a manufacturer prefix for each browser. Note that the examples only work in Chrome when called via the HTTP protocol, such as http://localhost:6655/mediastream.html.

WebRTC can also connect multiple browsers. The topology of peer-to-peer connections is endless, but a computer quickly reaches its limits when asked to play multiple media streams. WebRTC brings native real-time communication to the browser. Two browsers are needed to build a peer-to-peer connection across the Internet. WebRTC also works around NAT or firewalls. With its data channel, WebRTC not only ensures a bidirectional exchange of application data but also transfers media streams.

The direct connection between browsers saves time and data traffic. This opens up opportunities for many new HTML5 applications to web developers  – for example, video chats or distributed applications, such as a networked memory game.

The Author

Andreas Möller (http://pamoller.com) has focused on developing Internet-based software since 2001. His work includes database and web applications, as well as single-source publishing. He is currently a consultant and freelance writer.