Showing posts with label Vivox. Show all posts
Showing posts with label Vivox. Show all posts

29 March 2010

Vivox in SL: client-side components

This article follows "Vivox in SL: client, server and protocols". Here, I focus on the client components and how they interact to enable voice in SL.

Unused components

SLVoiceAgent.exe could not be found in the 1.23.5 SL client, however it is mentioned several times. The roles of other files such as tntk.dll or srtp.dll are not clearly mentioned in the SL wiki, but other SL viewers such as Hippo mention them as vivox_files. Mike Monkowski reported in the SLdev mailing list of September 2007 that srtp.dll and tntk.dll are part of the DiamondWare distribution, but don't appear to be used by SLVoiceAgent. My guess is they were redundant with other components.

ComponentDescription
srtp.dll ortp.dll provides RTP, making srtp.dll useless. Not present in the 1.23.5 Viewer.
tntk.dll I could not find what services tntk.dll provides, so I do not know which other library could have made it redundant. Not present in the 1.23.5 Viewer.
ssleay32.dll and libeay32.dll Other libraries such as ssleay32.dll or libeay32.dll mentioned in the Third-party libraries wiki page may have been replaced by ssl3.dll in the 1.23.5 Viewer. Not present in the 1.23.5 Viewer.
DiamondWare consists of a client SDK and a high-performance server. It was designed to be extensible via plugins on the client-side and 'bots on the server-side [...] Plugins and 'bots can be created by third-parties thereby leveraging the game development. The client-side is composed of VoIP building blocks. Diamondware seems perfect for a game integration. If it is present in the 1.23.5 Viewer, I could not find in which client-side component it was.

Used components

By "used components", I mean the v1.23.5 SL client components mentioned in the Third-party libraries of SL without the "unused ones" mentioned above.

ComponentDescription
alut.dll and wrap_oal.dll OpenAL's primary use is assumed to be for spatialized sample-based audio. Quite many platforms, engines and games use OpenAL as a LGPL 3D audio API. OpenAL relies on three objects: Buffer, Source and Listener (see OpenAL 1.1 Specification and Reference and the conceptual class diagram nearby). The OpenAL HelloWorld explains a lot. I think some functions in alut.dll (the ALUT library) might be called by wrap_oal.dll, but I am not sure.
ortp.dll ortp is a LGPL library. It is apparently part of the amsip toolkit (but it can be found outside of amsip) and provides the protocol layer primitives: SIP, RTP and ICE. These protocols were described in a previous article.
vivoxsdk.dll Uses the proprietary Siren14 codec by Polycom and relies on ortp.dll, alut.dll and wrap_oal.dll.
SLVoice.exe external daemon software started and stopped by the Second Life Viewer (SL wiki). More precisely, SLVoice.exe is a daemon that receives calls from the SL Viewer through a local TCP server (default is 127.0.0.1:44124). Also named client gateway, SLVoice.exe is the proprietary black box that communicates with the Vivox servers.

FMOD is a 3D audio engine. The SL client contains fmod.dll which is, to my mind, used to play in-game sounds, eventually in 3D. Some of fmod features such as 3D or channel groups might be provided by the Vivox SDK as well. This is a problem with using third-party libraries: their features sometimes overlap.

There are some remarkable differences between the v1.23.5 and the beta v2 SL clients concerning the voice components. For instance, the v2 beta client contains ortp.dll, vivoxoal.dll, vivoxplatform.dll, vivoxsdk.dll and SLVoice.exe. vivoxoal.dll may gather alut.dll and wrap_oal.dll, but I have no idea what new content vivoxplatform.dll can bring. I only focused on the 1.23.5 files mentioned in the documentation.

Interaction between components

Mostly boxes and arrows, but a picture component diagram and a communication diagram are worth a thousand words... (the OAL documentation mentions the hardware is not handled by OAL).

The messages sent from the Viewer to the gateway are based on the SLVoice Application 2.0 documentation.

Diagrams made with Dia. There are many things I tried to guess about this architecture, so feel free to tell me if I am wrong anywhere.

27 March 2010

Vivox in SL: client, server and protocols

Server-side

As I wrote before and based on Vivox' white paper, the main point is the Server-side mixing all voices in real time and delivering the audio in a single stream. I could not find much more information about the SL-specific Vivox server system (Linden Lab will not reveal their server-side architecture that easily), but I guess the Vivox server-side does not differ a lot between MMOG/VW. unused but required parameters can be found in the SLVoice documentation. This suggests either Vivox cared for a retrocompatibility with the SLVoice Application 1.0 or Vivox did not tailor their client API to SL needs. I am for the later, even though I could not find any documentation for the SLVoice Application 1.0 infirming or confirming that.

Joe Miller explained how the Vivox server sends the audio stream to users and how the system can scale:

According to Miller, the VoIP product is unique because of the ability to project the sound in three dimensional space, as a function of distance and direction from one avatar to another. It takes a 32khz signal at 32kbps from clients, sends it to an Intel based audio server where the input signals are mixed and properly positioned, acoustically, in three dimensions, and a stereo stream is sent back to the client at 64kbps. Even with 100 people speaking at once, the bandwidth requirements are the same for each individual because the servers (dual quad-core Xeons) mix the voices together into a single data stream.
The codec used is Siren 14/G.722.1 Annex C, developed by Polycom but now an international standard. It was chosen because it uses relatively low bandwidth but can carry a wide and dynamic range of audio – not just human voices – making it an ideal codec to broadcast, say, a musical event.

The range at which other resident can hear each other are explained in the SL wiki article "How far does my voice carry". Similarly to text-chat, the server computes the distances between people to determine who hears who, and sends appropriate messages after this computation. Hence (and hopefully) it's impossible to use a modified Second Life Viewer to remove the hearing range limits.

The OpenSim server architecture might not differ a lot from the SL one regarding to voice support. However, I could not find it reading the OpenSim wiki.

Client-side

On the client-side, Linden Lab have chosen to keep the voice features outside of the Viewer: The Second Life Viewer handles configuration, control, and display functions, but the voice streams (from the microphone and from the Vivox voice server) do not enter the Viewer. In other words, These [voice] technologies are contained in external daemon software that is started and stopped by the Second Life client.

The requestId can/should be a GUID so that each response matches a unique request. Each gateway response also contains the request it received. This enables the XML-based protocol to be stateless. TCP provides a reliable transmission that prevents packet loss (important to update the UI reliably and in a timely manner).

voipforvw is a GPL alternative for SLVoice on OpenSim. One of its developer wrote it is a snap-in replacement for this executable [SLVoice.exe] that communicates with the viewer and as you’d guess, does the heavy lifting and coding/decoding. But the project started in February 2008 and has not received any commit since May 2009.

More about the client components in an incoming article ...

Voice protocols (in a nutshell)

The following protocols or techniques are used by some components in the SL client.

SIP is an application-layer protocol and incorporates many elements of HTTP such as headers, encoding rules and status codes. As indicated by its name, SIP is only used to initiate communications between clients. Clients start communicating in peer-to-peer after they have been paired by a SIP server. The SL Viewer uses ports 5060 (non-encrypted) and 5062 (TLS-encrypted) for SIP with UDP. Once clients are paired, they can start exchanging data.

ICE is not a protocol but rather an initialization technique that facilitates peer-to-peer communications in reducing the NAT-traversal delay. It uses a STUN client-server strategy to pair agents. When paired, agents do not rely on the server anymore.

RTP is an application-layer protocol that defines a packet format for delivering audio and video. RTP Use Scenarios in the RFC contain multicast, Mixers and Translators. The use of UDP for the transport layer is obvious in this real-time "send-at-most-once" media-streaming context. The SL Viewer uses the 12000 to 15000 (or 13000?) port range for RTP.

26 March 2010

Vivox in SL: timeline, business and reception

Timeline

This article follows the one about Vivox integration.

As an introduction, a timeline and some figures:

  • 2006: the Vivox-SL collaboration started (Joe Miller said in March 2007 that The program has been in development for over a year).
  • 2007: voice chat is integrated into the SL Viewer in August.
  • 2008: SLim is launched in September. It is a lightweight client that enables SL residents to interact with their Second Life friends without having to go inworld with the Second Life Viewer and the ability to leave voice mails for offline friends.
  • 2009: AvaLine (beta) is launched in May. To encourage all residents to use Avaline when the beta ended in August, LL offered free communications to AvaLine subscribers the first month and free voice-mail for the rest of 2009. And a Hula Bear.

According to Linden Lab's blog and press release, Over 15 Billion Minutes of Voice Have Been Delivered in Second Life. In the entire year 2009, the number of voice minutes used by SL residents has remained around 3 billion per quarter and more than 60% of Second Life Residents are using voice at any given time. Vivox reported in July 2008 a daily average of 600,000 minutes of peer-to-peer calls, Over 1 billion minutes of voice communications per month, Group events as large as 400 Residents. As a comparison, Skype had 6 billion minutes per month at that time (according to Gigaom). The numbers extracted from the May 2009 press release (700,000 unique users consuming more than a billion minutes a month) mean the average resident using voice-chat spends 17h per month speaking or listening (and not 357).

Out-of-game messaging and calls

AvaLine is the name of the current virtual telephony system powered by Vivox. A timeline in the May 2009 press release indicates that sending SMS from inside SL to out-of-game mobile phones should become possible in 2010. AvaLine's extension will let residents call or send SMS to real-world phones. Since March 2007 (before the launch of the voice-chat in SL), SL residents were told they would have to pay to use out-of-game telephony features: Eventually, Linden plans to charge Linden Dollars for the service to be activated on privately owned land. People who own land can pay to have VoIP activated for all users on their property. For a single user, AvaLine costs L$14,400/year, ie US$70/year. Residents pay only for monthly flat-rate AvaLine service, regardless of how many calls are received or minutes are used. And

Before AvaLine was launched, other organizations had started to think and actually implemented out-of-game calls and SMS. For instance, in September 2007, NEC opened an island in SL. They offered the possibility for residents to make calls to another person in the real world and send text oriented messages such as SMS, email and IP Messaging [...] to the real world. Other organizations such as Swisscom (through Starfruit) sponsored 100,000 SMS that residents could send from virtual phone booth to out-of-game phones. Another system called SLFONE enables residents to send 120SMS to 240 countries and 700 networks for 8500L$/year (ie US$40).

Reception and adoption by the residents

Three major announcements were made by LL to their residents about voice integration in SL. The earliest news was a FAQ justifying the introduction of voice to SL. It was published by Joe Linden and received half a thousand comments. The second news was published in May 2009. It introduced the AvaLine beta and showed a lot of unsatisfied users as well. Third, the end of AvaLine beta in August 2009 and its introduction to all residents as one of the new bells and whistles brought a lot of concern as well. The residents' comments showed they cared more about stability than new features. Many of them wondered why they would pay to use a semi-working functionnality while they are currently doing fine with other voice-chat systems.

  • When voice was first introduced the same arguments were made - LL said we all loved it, most people said they never/couldn't use it and some people said it was vital for them.
  • This new shiny toy is just something to try to distract people from the real issues.
  • We want stability before the introduction of new features.
  • Utterly pointless and I bet you were told so in all those ridiculous surveys. And your call quality is awful, skype ftw.
  • each shiny new toy gets lots of marketing attention, but fixing things that are broken in a product you've already sold isn't as sexy.

Unfortunately, one of the last word given by LL was: join me for Office Hours [...] we will discuss your ideas about what kinds of communications tools would best help you enhance your Second Life experience. A very interesting official Linden reaction to residents' complains was Partnerships, like the one with Vivox that helped bring us AvaLine, also greatly reduce the number of internal staff needed to deliver projects.
LL may have to keep bringing new content to its customers to keep some of them attracted. That is a perfectly normal marketing strategy in any MMOG. However LL is not an MMOG, and the average SL residents may differ quite much from the average MMO gamer. The obvious difference is: some of the residents work or earn money thanks to SL. So I think it might be more efficient to base a communication strategy on debugging or maintenance rather than on new features.


Finally, I am puzzled by the strange (or lack of?) community management style followed by LL. Joe Linden published "Over 15 Billion Voice Minutes Served", but he was not in charge of answering the comments although he wrote I look forward to hearing what you think. Instead, it was Jeska Linden (an actual community manager with 20 times more news posted than Joe) who stepped up to the plate. Is "what is rare is important" the point in Joe announcing new features to the community?

25 March 2010

Vivox integration

This post somehow follows the one about communication channels of February 2010.

Dana Massey explains that voice-chat features can help consolidate current player bounds and introduce newcomers. integrated voice chat has enhanced their experience in a range of ways. It makes playing the game easier and more fun, it strengthens the bonds of community that really keep people in a game over the long term, and helps them ease new players in their worlds. These are the cornerstones of every online game. It’s time for developers to break down the barriers to entry and make it easy for people to make real connections in their online worlds. The Vivox system seems to be a technically interesting voice-chat feature that boasts answering these issues.
Disclaimer: I have never worked at Vivox and I have neither been bribed nor received pressure to talk about their system.

Overview

In a nutshell, Vivox relies on VoIP and increasingly higher-bandwidth telecommunication systems. The whole list of their features can be found in page 2 of their Tear Sheet. They provide a Vivox SDK that lets developers integrate one-on-one and group voice chat, the ability to see when your friends are online and in game, and links to other IM networks as well as 3D positional audio, out-of-game connections and voice fonts into the game. (Although I do not know if what Vivox provides is a true SDK or simply a complete API, I will keep writing SDK.) This bunch of features is supposed to bring interest among developers, particularly if they allow cross-platform communications: Vivox is, after all, a telephony company. The social real-life impacts are enormous: you can stay part of the raid planning even when you can't run the game itself. In other words, players are more and more hooked up to the game.

Integration process

Vivox provides a phased integration of their system into existing MMOG: three major steps are followed one after another.

  1. Anonymous website-based voice channels (do not require user authentication): no more hassle for the user with Ventrilo or Teamspeak The solution is fully managed and supported by Vivox - you don't have to run your own communication server anymore.
  2. Authenticated website-based voice channels enabling game logics and roles (groups, guilds, etc.): Mapping game roles to channel rights by integrating with the game, we can match the communication hierarchy to the command hierarchy, so that raid leaders or fleet commanders are automatically mapped to the communications infrastructure.
  3. In-Game voice features. Players can begin a conversation on the web and transition to the game, and vice versa. In EVE Online for instance, Vivox is integrated inside the game features: Richardson’s team leveraged Vivox’s flexible tools to tightly integrate voice chat into EVE Online’s existing controls, social structure, and game situations, considerably enhancing ease of use and immersion relative to the third party applications many players were using.

Their white paper from January 2009, explains how Vivox partners with the MMOG developers.

  1. Kick-off meeting leads to specifications and introduces the SDK to the MMOG developers
  2. UI design. In their top-down approach, the earlier it is fixed, the shorter the development cycle.
  3. Wiring calls to UI stubs consists of connecting the Vivox calls to UI elements
  4. Testing is based on test-cases

Extra calls

According to their white paper, Vivox can create peering relationships between game titles, or allow publishers with multiple game titles to open up voice communication among the players of multiple games. Vivox can also provide links between your virtual domain and the “real” world through connections to mobile and fixed phones. I do not know much about telephony, so I only show two high-level sequence diagrams of what I think happens when a group is created. The first diagram is in a traditional Vivox-less game.

The second diagram below shows the case when a Vivox in-game voice channel is created. Feel free to tell me if I'm wrong anywhere.

On top of these extra calls, a lot of work remains in wiring calls to UI stubs. I could not find exactly how much overload for the client and for the game server this meant.

A marketing and technological success

From a marketing perspective, Vivox is a double success.
First, Vivox is a success for itself: although many players may keep using their previous voice-chat softwares like Ventrilo or Teamspeak, many players are truly going to enjoy talking inside the game without alt-tabbing. I think MMOG newcomers are very likely to adopt in-game voice-chat rather than external softwares. With more and more performant outsourced in-game voice-chat systems like Vivox, I think external softwares will slowly disappear in a close future. This leaves Vivox with no competitor.
Second, MMOG companies can earn money in RMT with voice-fonts, voice mail and out-of-game messaging, but also in audio advertising.

Vivox is also interesting from a technical perspective. The system can be plugged on an existing traditional client-server game architecture without affecting (too much) the network or the game quality. A very adaptive solution. Based on their white paper, I think the Vivox engineers have been very smart in deciding to:

  • mix all group communications on their servers and send them as a single voice stream to all voice channel participants
  • launch peer-to-peer connections in the case of in-game Person-to-Person Communications

As a result, it is apparently possible to handle as many as 6,000 people in a single voice chat session. Although this impressive number relies on a traditional cluster of servers behind a load-balancer, I do not think there are many smarter choices.



I used sdedit for the sequence diagrams