On Sept. 8, 1966, “Enterprise” communication as we know it crackled to life when Capt. James T. Kirk of the Federation Starship Enterprise flipped open his communicator and spoke these words to his crew: “Transporter room: Three to beam up.”

Yet even earlier that decade—1964 to be precise—the Bell Telephone Pavilion at the World’s Fair in Flushing Meadow Park showed a family on Earth having a video chat with the family patriarch deployed to a space station orbiting high above the planet. Imaginations soared.

Today, of course, video chat is real, and so is audio and video streaming to remote corners of the world. If there’s WiFi, there’s communication.

So what’s the buzz about WebRTC? Simply, it takes real-time audio/video communication to the browser, opening up possibilities for enterprises—businesses, healthcare organizations, governments and the media, among many others—to better serve their constituencies. And that, according to Dan Burnett (one of the editors working on the WebRTC specification at the World Wide Web Consortium), is what will make it truly transformative.

“We’ve had chat: text, voice and video. The ability has existed for a while. The difference is that no plug-ins are required. People hate plug-ins. They create security problems,” Burnett said. “To include audio and video almost trivially in a Web page is transformative. You’ll see for the first time the really ubiquitous use of video.”

If you look at building a client that does Voice Over IP, you need a foundation of a microphone, camera, processor and operating system. In a PC, most of that is there now.

Next you need a visual interface. On a smartphone, that would be the buttons. On a PC, it’s the screen.

Finally, you need a media engine, which takes input, implements code to compress audio and video (and does echo cancellation), puts it into packets, and sends it to the right place.

In WebRTC, there are actually two specifications being worked on side by side, Burnett said. One is WebRTC, which performs the real-time transmission. The other is the Media Capture and Streams specification, informally known as the “getUserMedia” call, which is a method for gaining access to a local camera and microphone for use in an application or to send across a peer connection. “You cannot do WebRTC demos without Media Capture and Streams,” Burnett said. “It defines media streams and media stream tracks of audio and video.”

From the beginning
The idea of being able to do real-time communication from the browser was first acted on at Google. When Google acquired Global IP Solutions in 2010, it acquired the most adopted media engine in VoIP. Google then open-sourced the code and put it into Chrome, and this became the genesis of WebRTC: real-time communication in the browser.

An important milestone in WebRTC history was reached a few weeks ago when Firefox released basic support for WebRTC in its browser without a user having to set an option, Burnett said. With Chrome and Firefox on board, “a significant fraction of the browser market” supports WebRTC today, he added.

“Now, all the application has to do is manage the interface. The media engine’s already in the browser,” explained PKE Consulting’s Phil Edholm, producer of the WebRTC Conference. “All the app is doing is making API calls.”

So the communication interface becomes the browser itself—no download required. “With HTML5, the concept of apps morphs,” Edholm said. “Browsers look like apps. Download a pointer and have an app experience. Now, that can include real-time voice, video and data. Point your browser at Skype and have the full experience without downloading the client.”

To take it to the next step, Edholm said, “If we’re connected to a website, the site can initiate communication between our browsers.” This, he emphasized, represents a profound change in how communication occurs.

Prior to this, services had to talk to each other to negotiate communication on behalf of people. With Skype, Edholm said, if you’re not a member, you’re back to server-to-server communication, not peer-to-peer.

It was at CERN in the late 1980s that the notion of creating a browser to talk to a server farm led to the concept of the World Wide Web. And, in almost all information systems since 1993, you point a browser at a server and create an event with that server. WebRTC enables communication to follow the same paradigm. In the next three to five years, Edholm said that as many as 4 billion devices will be WebRTC-enabled.
#!
Communication as a secondary event
Edholm estimated there are 20 million to 40 million active websites, and that half of those could be capable of hosting communications. “Eighty percent to 90% of transactions are preceded by a visit to a website,” he said. “There’s something the user can’t resolve, so he calls an 800 number listed on the site. That site could easily move to contextual communication.”

Aside from the kind of communication for when a website senses a user’s input error, where a box pops up and asks if he or she would like to start a chat with a customer service rep, Edholm said he sees a time when “if two of us are looking at the same product on Amazon, we can have a real-time discussion with other potential purchasers” about price, reliability and other factors that go into a purchasing decision. And the users did not come to the Amazon site seeking that communication; they came to buy a book. The communication is secondary, but offers tremendous value to the user.

“The biggest impact of the Web was not created by tech people,” Edholm said. “At eBay, some guys wanted to sell antique Pez dispensers.”

Data could be biggest piece of all
WebRTC efforts have focused on the exchange of audio and video between browsers. But W3C spec editor Dan Burnett explained that data exchange could be the biggest beneficiary of this work.

Using an as-yet unimplemented Data Channel API, he said, “You can send arbitrary and unstructured data through the channel.” He pointed to Cube Slam, a game that uses the data channel to publish ball and paddle locations, as one example of how this information can be passed. “There have been peer-to-peer data capabilities, but not browser to browser. Some people think it’ll be bigger than audio and video.”

Wait a minute… Don’t we already have Skype?
Isn’t the promise of WebRTC already here? Doesn’t Skype represent the kind of real-time audio/video communication the specification spells out?

Doug Michaelides, managing director of user experience design at software development company Macadamian, explained that “Skype is an application, while WebRTC is a set of technologies used to create applications to solve all sorts of business problems and customer-experience opportunities.”

Among the ideas put forth at Macadamian’s WebRTC: Transforming Communications event in Ottawa in May was, according to Michaelides, “One member of the audience talking about how a client of his was using WebRTC on their website to basically create a virtual buying experience in a catalog for jewelry and enabling people to have a conversation with a consultant to help them choose the quality of diamond or ruby or whatever. You kind of start to get the sense for the ubiquity of being able to access real-time communication, and interaction from the website’s going to really enrich Web apps and mobile apps going forward.”

Other scenarios include telemedicine, where potential organ donors can speak to a healthcare professional while making the decision to become a donor, or questions can be asked for a blood services group before giving blood, or where people with a certain ailment or using a particular drug can speak with others in the same condition.

Peer-to-peer: Feature, or bug?
The kind of communication most associated with Skype-type calling is peer-to-peer; that is, all connected nodes are both client and server. There is no centralized system underlying it. It is both the greatest appeal (and the biggest drawback) of WebRTC-based applications.

But for some of the business scenarios presented here, conferencing plays a big role, making it a big limitation. “Large telecoms are trying to solve that,” said Jean-Francois Morin, software developer at Macadamian. “Conferencing will require companies to do a lot more.”

Morin did say that one way around that is to use Flash as a third-party application that can unify all browsers despite the WebRTC codec they implement, allowing it to host up to 25 people in a meeting room.

This is an area being explored right now by a company called Watchitoo, which is working on a peer-to-server-to-peer architecture that could, for example, allow a real-time video feed to stream from a camera directly into a browser window, which can then scale to open the video and audio stream to an exponentially larger group.

“It’s all APIs,” said Nathan King, senior solutions director at Watchitoo. “That’s still at least a year away though, because we’re all still sitting behind firewalls,” which block packets from unknown peers. “But the browser manufacturers are adopting it and running with it.”

King added, “The point we are trying to make is that there needs to be a medium through which multiple people can communicate that sits in the middle, allowing the end users to make the minimum amount of connections (because of limited bandwidth) to expand the maximum amount of users that can participate.”