WebRTC: Video telephony without a browser plugin
Direct Line
The WebRTC protocol converts your web browser into a communications center, supporting video chat over a peer-to-peer connection without the need for helper apps or browser plugins.
People who use video chat and other forms of real-time Internet communication often rely on Skype or similar tools. Web browsers too often depend on Flash or Java plugins for real-time communication. The latest generation of browsers, however, offer a powerful new tool for building real-time communication into scripts and homegrown web applications. WebRTC (Real-Time Web Communication) [1] supplements the new HTML5 standard by bringing native real-time communication to the browser.
WebRTC can handle video chat and similar formats. Communication occurs directly from browser to browser, without the need for an intervening web application. In this article, I show how easy it is to build a homegrown Internet video chat application by integrating WebRTC with the usual collection of web developer tools: HTML, JavaScript, CSS, and Node.js.
WebRTC is jointly promoted by browser vendors such as Google, Mozilla, and Opera. (Microsoft considers WebRTC to be too complicated and has presented UC-RTC [2] as its own design for real-time communication in browsers.) Although the WebRTC specification is not yet complete, the Google Chrome and Mozilla Firefox 22 web browsers already largely support it. WebRTC is a free standard described in a set of IETF documents [3], and W3C has already accepted a draft for a programming interface [4] for WebRTC in the browser.
You'll find a demo video that describes some of WebRTC's capabilities on YouTube [5]. The demo, which comes from the Mozilla project, shows a video call from a Firefox browser on a cellphone via the public telephone network.
How It Works
Figure 1 shows the data flow for a WebRTC session. Application and configuration data go their separate ways: The server acts as a router that accepts configuration information (i.e., IP addresses or information about video and audio formats) from one browser and forwards it to the other. WebRTC calls this the signaling process. The HTTP or WebSocket protocol is a useful choice for transferring the configuration data, with JavaScript Object Notation (JSON) providing the structure.
After exchanging the IP addresses on the signaling server, the browsers establish a peer-to-peer connection, as shown in Figure 1. The connection is used to transmit the application data directly between browsers via TCP or UDP, saving time and data traffic.
To open the connection, WebRTC works around Network Address Translation (NAT) routers or firewalls through the use of Interactive Connectivity Establishment (ICE) [6]. ICE retrieves usable IP addresses and ports for sending and receiving data over peer-to-peer connections.
ICE first tries to determine the IP address and port using Session Traversal Utilities for NAT (STUN) [7]. To do so, it sends a message to a public STUN server and receives the sender's address in return. However, if the NAT router blocks STUN, the IP address and port are provided by a relay server on the web using Traversal Using Relays around NAT (TURN) [8]. ICE reports the appropriate IP addresses and ports to WebRTC by triggering a onicecandidate
event. The data is wrapped, along with the protocol to be used – TCP or UDP – in an ICE candidate-type object.
Sample Application
The following sample application uses WebRTC in the browser to implement a video chat. The browser takes the image from the webcam and the sound from the microphone and passes it to a second browser via WebRTC. The client application in this article uses HTML, CSS, and JavaScript for the implementation. Node.js is used for the signaling server.
The example is initially restricted to the Firefox browser, but you can port it to Chrome. Figure 2 shows the sample application running in Firefox on Ubuntu Linux. The other end is using Firefox on Windows 7.
The getUserMedia()
programming interface [9] in the latest versions of Firefox, Chrome, and Opera provides access to the webcam and microphone on the local machine, but it first asks the user's permission in a pop-up. WebRTC combines the video and audio streams to a media stream, as shown in Figure 3, which it can then process. A media stream can contain any number of video and audio tracks; one audio track includes two stereo channels.
Video and Audio Signals
Listings 1-3 [10] demonstrate how to integrate the local webcam image into an HTML document (Figure 4). The HTML document in Listing 1 references two JavaScript files in the header area (lines 3-4). The core.js
file contains functions for several examples in this article, such as mediastream.js
. In the body of the HTML document in Listing 1, the video element with the ID local
is waiting in line 8 for a connection to a media stream.
Listing 1
HTML with a Video Element
The call to the mozGetUserMedia()
JavaScript method in Listing 2 (lines 2-8) grabs the images from the webcam and the sound from the microphone and bundles it into a media stream. The code in lines 4-6 passes in the stream
as a parameter to the callback function. The connectStream()
function in line 5 (and in Listing 3) plays the media stream in the video element. If a problem occurs, a callback function is called in line 7 to handle errors. Firefox makes the specification of a callback function for this case mandatory.
Listing 2
mediastream.js
Listing 3
core.js (Excerpt)
Listing 3 shows the JavaScript connectStream()
function for the Firefox browser. Line 2 uses the querySelector()
method with a CSS selector to select an element from the HTML document – preferably a video element. Line 4 binds the media stream to this element using the Firefox-specific mozSrcObject
attribute.
Opening a Peer-to-Peer Connection
The next example goes one step further and transmits the media stream via WebRTC. Before the transfer, the browsers involved exchange session descriptions in Session Description Protocol (SDP) format [11]. The descriptions include information about media streams to be transmitted and the IP addresses, ports, and protocols to use.
Figure 5 shows the steps that an application has to take to open a peer-to-peer connection. Initially, create()
, top left in the figure, creates a peer object for the browser on the left, and stream()
stores the local media stream in the local peer object. Next, offer()
generates a session description, and keepLoc()
stores it as a local session description, also in the local peer object. Then, send()
transmits the information to the right-hand browser.
When the session description reaches the right-hand browser, keepRem()
stores it as information about the other party. Then, answer()
generates a local session description for this browser, and keepLoc()
stores it as a local session description. Finally, send()
transmits it back to the left-hand browser, where the local keepRem()
function stores it as a remote session description. This completes the handshake.
WebRTC in the Local Browser
Listings 4 to 6 transfer the media stream via a peer-to-peer connection, as shown in Figure 5. To remove the need for a signaling server in this preliminary study, two video elements just communicate in the Firefox browser on the local machine (Figure 6). Even locally, WebRTC uses a network connection.
Listing 4 shows the HTML document with the example. As in Listing 2, it binds two JavaScript files in the header. Line 9 of the document body contains a video
element for outputting the local media stream, whereas line 10 has one for outputting the media stream transmitted via WebRTC.
Listing 4
Local and Transmitted Media Stream
Listing 5 shows the JavaScript code of the sample application from the localwebrtc.js
file. Querying and configuring the media stream is handled as in Listing 2, but with a different callback function in lines 5 to 19. In Listing 5, line 6, first connects the media stream with the video element. Line 7 generates the local peer object and stores it in the pcLocal
variable, and line 8 creates the remote peer object in pcRemote
. Line 9 stores the local media stream in the pcLocal
peer object.
Listing 5
WebRTC in the Local Browser
The asynchronous createOffer()
method in line 10 creates the session description for the local context and passes it to the callback function in lines 10 to 17, as in the desc
parameter. The setLocalDescription()
method stores the content of desc
as a local session description in the pcLocal
peer object. As the application runs locally, the setRemoteDescription()
method saves the content of desc
in line 12 as a remote session description for the remote peer object, pcRemote
.
The call to the asynchronous createAnswer()
method in line 13 creates the local session description for the remote peer object, pcRemote
, and passes it to the callback function in lines 13 to 16. Line 14 then stores the session description for the remote peer object, pcRemote
, as a local session description, and line 15 stores it as a remote session description for the local peer object, pcLocal
.
The onaddstream
event for the remote peer object, pcRemote
, is triggered in line 18 by calling the onadd()
function from Listing 6, which returns a callback function. It combines all incoming media streams with the video element that has the ID remote
(Listing 4, line 10). Local communication has now been established.
Listing 6
onadd()
WebRTC via the Internet
The typical application scenario for WebRTC on the Internet requires a signaling server. The signaling server forwards the configuration data, as shown in Figure 1. The following example uses the JSON format for this data. Each message corresponds to a JavaScript object with two components: command
describes the type of message; data
stores the payload data.
The application uses three different types of messages: A message of the offer
type (Listing 7) opens a peer-to-peer connection. To acknowledge opening the connection, the remote site sends a message of the answer
type in return.
Listing 7
Offer in JSON Format
The data fields of the two signals contain the respective local session description. When ICE sends a new candidate, it communicates this by sending an ICE-type message. In this example, data
contains the candidate.
Signaling Server
JavaScript is also used for the signaling server, but server-side on Node.js. On Ubuntu Linux, you can install Node.js (currently version 0.8) and its package manager Npm with the
sudo apt-get install nodejs npm
command. The signaling server also requires the Connect modules (version 2.7.2 or later) as well as version 1.0.8 of WebSocket. Because WebSocket depends on node-gyp
, you first need to install this:
npm install -g node-gyp
Then, you can install websocket
and connect
:
sudo npm install connect websocket
Listing 8 shows the code for the signaling server in Node.js. Lines 2 to 4 integrate the required modules. The channels
variable (line 5) stores the connected clients in a field.
Listing 8
Signaling Server for Node.js
In the next line, the app
variable accepts a connect
application with the static
module added on top. Static tells Node.js to respond to HTTP requests, such as http://localhost:6655/webrtc.html, by returning the webrtc.html
file from the ../
directory.
The WebSocket server binds an httpServer
object in line 9. As an argument in the call to the createServer()
method, the app
variable passes the application logic from the connect
application to the httpServer
object. The listen()
method tells the HTTP server to listen on port 6655.
Connection Events
The callback function in lines 12 to 28 of Listing 8 describes the response of the signaling server when setting up a connection via the WebSocket protocol. To do this, the on()
method in line 12 binds the callback function from the second argument to the instance of the request
event. In line 13, the callback function stores a valid connection in the thisChannel
variable; line 14 adds it to the list of existing connections.
Lines 15 to 22 define a callback function for processing an incoming message. In the loop across all connections (lines 16-21), the sendUTF()
method forwards the message to all other connections. Lines 23 to 27 remove the client from the list on terminating the connection.
As Figure 7 shows, you can watch the signaling server at work: Thanks to console.log(msg.utf8Data)
in line 18, it writes its output to the console.
If you want WebRTC to work over the Internet, you need to send the session descriptions in Listing 5 to the connected browser via the WebSocket protocol and the signaling server, as shown in Figures 1 and 5. Listing 9 shows the HTML document for this example. In contrast to Listing 4, it binds webrtc.js
instead of localwebrtc.js
.
Listing 9
HTML for WebRTC via the Internet
Listing 10 shows the JavaScript code for webrtc.js
: Line 8 opens a connection to the WebSocket server and stores it in the channel
variable. After opening the connection, the code executes the callback function from lines 9 through 15. Within this function, line 10 creates a new peer object and stores it in the global variable pc
. Line 11 uses the addStream()
method to add the media stream, stream
, from the request in line 4 to the peer object.
Listing 10
webrtc.js: WebRTC via the Internet
The next two lines create the callback functions for the onicecandidate
and onaddstream
events in the connection setup. The event handling for onicecandidate
is in line 12, which calls the onice()
function in Listing 11. Like the onadd()
function in Listing 6, it returns a callback function. When called, the send()
function (Listing 13) sends an ICE candidate. Finally, line 14 in Listing 10 uses createOffer()
to generate a session description and passes it to the callback function from the call to desc()
(Listing 12). In doing so, line 3 of Listing 12 stores the session description in the peer object, pc
, and the next line forwards it via the signaling server.
Listing 11
Event Handler
Listing 12
Session Description
Listing 13
Serializing Objects
Receiving and Sending
However, if the browser receives an external message via the WebSocket connection, the callback function in lines 16 to 34 of Listing 10 sees some action. It transforms the incoming JSON message into a JavaScript object that processes the subsequent switch
statement, depending on the message type.
Listing 13 shows the send()
function, which uses a method with the same name in line 2 from the WebSocket object to send a message. Before that happens, stringify()
converts it to a string in JSON format.
Data Channel
WebRTC can do more than just transfer media streams. The data()
function in Listing 14 creates a data channel for transmitting text messages that works like a WebSocket. Line 3 uses the createDataChannel()
method to open a channel for the current peer-to-peer connection in the peer object, pc
. The ondatachannel
event in line 5 occurs when a data channel is opened. The config()
function in line 11 defines the callback function that is triggered when a text message arrives, and the send()
method in line 14 sends messages over the existing connection.
Listing 14
Data Channel
Chrome Can Do
The sample application for Firefox can be ported for the Google Chrome browser with little effort. To port the application to Chrome, just change the manufacturer prefix moz
to webkit
in the examples. The expressions you need to change could just as easily be outsourced into functions that call the appropriate function, including a manufacturer prefix for each browser. Note that the examples only work in Chrome when called via the HTTP protocol, such as http://localhost:6655/mediastream.html.
WebRTC can also connect multiple browsers. The topology of peer-to-peer connections is endless, but a computer quickly reaches its limits when asked to play multiple media streams. WebRTC brings native real-time communication to the browser. Two browsers are needed to build a peer-to-peer connection across the Internet. WebRTC also works around NAT or firewalls. With its data channel, WebRTC not only ensures a bidirectional exchange of application data but also transfers media streams.
The direct connection between browsers saves time and data traffic. This opens up opportunities for many new HTML5 applications to web developers – for example, video chats or distributed applications, such as a networked memory game.
Infos
- WebRTC: http://www.webrtc.org
- UC-RTC: http://html5labs.interoperabilitybridges.com/cu-rtc-web/cu-rtc-web.htm
- Rtcweb status pages: http://tools.ietf.org/wg/rtcweb/draft-ietf-rtcweb-overview/
- WebRTC 1.0 W3C editor's draft: http://dev.w3.org/2011/webrtc/editor/webrtc.html
- WebRTC MWC phone demo: http://www.youtube.com/watch?v=rWPZZeXK6g4
- ICE: http://en.wikipedia.org/wiki/Interactive_Connectivity_Establishment
- STUN: http://en.wikipedia.org/wiki/STUN
- TURN: http://en.wikipedia.org/wiki/Traversal_Using_Relays_around_NAT
- getUserMedia(): http://dev.w3.org/2011/webrtc/editor/getusermedia.html
- Listings for this article: ftp://ftp.linux-magazin.com/pub/listings/magazine/154
- SDP format: http://tools.ietf.org/html/rfc2327