Showing posts with label scalability. Show all posts
Showing posts with label scalability. Show all posts

15 October 2014

Network Algorithmics by Varghese

I skimmed Network Algorithmics by George Varghese (2005). This book presents common tricks used at the OS, hardware, and architecture levels to prevent network bottlenecks. Here are 15 principles listed on the first page.

Number Principle Example
1 Avoid obvious waste Zero-copy interfaces
2 Shift computation in time
2a Precompute Application device channels
2b Evaluate lazily Copy-on-write
2c Share expenses, batch Integrated layer processing
3 Relax system requirements
3a Trade certainty for time Stochastic fair queuing
3b Trade accuracy for time Switch load balancing
3c Shift computation in space IPv6 fragmentation
4 Leverage off system components
4a Exploit locality Locality-driven receiver
4b Trade memory for speed Processing, Lulea IP lookups
4c Exploit existing hardware Fast TCP checksum
5 Add hardware
5a Use memory interleaving and pipelining Pipelined IP lookups
5b Use wide word parallelism Shared memory switches
5c Combine DRAM and SRAM effectively Maintaining counters
6 Create efficient specialized routines UDP checksums
7 Avoid unnecessary generality Fbufs
8 Do not be tied to reference implementation Upcalls
9 Pass hints in layer interfaces Packet filters
10 Pass hints in protocol headers Tag switching
11 Optimize the expected case Header prediction
11a Use caches Fbufs
12 Add state for speed Active virtual circuit list
12a Compute incrementally Recomputing CRCs
13 Optimize degrees of freedom IP trie lookups
14 Use bucket sorting, bitmaps Timing wheels
15 Create efficient data structures Level-4 switching

25 October 2013

Blizzard's Hadoop platform

Talk given by Brian Griffith and Amanda Gerdes at the OC Hadoop user group meeting in October 2013.

Blizzard uses the same Hadoop platform for Diablo 3, Starcraft 2, WoW, and Hearthstone. This platform went live in March 2013. Before this platform, game developers would log game events in log files, and use custom scripts to ETL these log files into relational databases for analysis. Problem: cumbersome, hard to maintain, low performance.

Solution: game developers, on their own, decide what to track in their game, and send that data to the platform. Instead of a log file, the game developers send protobuf objects. The 20 nodes in the Hadoop cluster receive and deserialize around a billion objects per day. The message's headers determine where to store each protobuf object within Hadoop. Blizzard also uses Hadoop as an operational data store. The cluster runs map-reduce jobs to filter and aggregate the stored protobuf objects. Currently, the 20 nodes store 60TB, but now that Blizzard realizes what they can do with Hadoop, they plan a 100-node cluster storing 1PB. Current bottleneck: CPU for deserialization. They rarely hit the disk for data so no IO waiting bottleneck.

For the messaging, they use a federation of machines running RabbitMQ. 50 producers worldwide (China, Europe, US, etc.) and 8 consumers (most likely in the Blizzard headquarters in California). When an Internet cable got cut with China, some messages were queued for 40 hours.

The analysts were using Greenplum for processing game data in parallel and ETL. Now that Hadoop is in place, they could start using Pig for ETL. But sales and customer data still come from relational databases, and the Greenplum is great at ETL. So there is no reason to force analysts into Hadoop. Solution: ETL jobs pull data from Hadoop and store it into Greenplum for warehousing. Greenplum is still in charge of its own ETL jobs.

Some data (such as a character's level or class in WoW) stay interesting across patches, but others (such as transmogrification usage) are only interesting when the feature launches. So the platform allows to build KPIs daily (e.g. activity aggregated per character level) and dive deep on demand.

Use case: WoW economy. Every gold transaction is sent to Hadoop. Reduce step aggregates by NPC id, item id, or player id. Makes it possible to:

  • Check if the gold sinks follow designers' expectations.
  • Detect networks of gold farmers.
  • Keep an eye on auction house prices.
  • More ...

28 August 2012

Bots for load-testing

A list of MMOs, MOBAs, and FPS games that have been load-testing using bots.

TERA

Koo, 2010, How to support an action-heavy MMORPG, slides

Their system to load-test a shard is called Sisyphus, and runs 1500 clients per machine. The behavior of the bots is based on the behaviors from real players (probably from alpha). A WAN simulator is used between bots and realm servers; the average latency is set to 200ms. A high-performance machine and a "dedicated line" were needed.

EVE

Press, 2011, Orchestrator: A post-mortem on an automated MMO testing framework, slides

EVE started to make a thin client from its game client in 2010. Orchestrator is the tool they use to load-test and integration-test their architecture and code. Orchestrator does not send one script to each client, proxy, and server. Instead, it runs a single master script, and tells them, as they progress in the test, what the next operation is. The test can be stopped right when a client reports a bug, not until everything scheduled has been sent.

OpenSim

Lake, 2010, Distributed scene graph to enable thousands of interacting users in a virtual environment, paper

The limitations we have encountered with avatar scaling during these experiments have been in getting enough hardware to generate the load of over 1000 clients and the limited physics simulation capabilities of a single thread on the scene server.

League of Legends (allocating games to servers)

Delap, 2010, League of Legends: Scaling to millions of ninjas, yordles, and wizards, video + slides

  • Load-testing in a realistic setup: more than 50 machines with the same spec as those in production
  • EC2 is a good tool, but the network is not reliable, so careful not trying to fix problems that only happen in the test setting.
  • With thousands of clients, logs may not be the best way to gather test results.

League of Legends (chat)

McArthur, 2011, Building the chat service for League of Legends, slides

Dozen of EC2 machines, each running 5-9k bots. Each bot is an XMPP chat client (they used the Smack API). Load-testing is useless without proper modeling.

Crysis 2

Hall, 2011, A Programmer's Post-mortem Crysis 2 Multiplayer, slides

They wanted to check the frame rate with lots of moving entities, and detect bugs or gameplay issues. They used automatic testing. "Lots" of bots are run for 10 minutes per level to stress-test the builds. The bots do random actions like walking, jumping, or shooting.

Gears of War 3

Weilbacher, 2012, Dedicated Servers in Gears of War 3 Scaling to Millions of Players, slides

Their bots are clients without renderer and user input. They run automated bot matches to check the performance of their server platform. For Gears 2, they used to run 2.5 games per core in 2009. For Gears 3, they run 7 games per core in 2011.

Guild Wars 2

Patrick Wyatt, a lead programmer on Guild Wars and Guild Wars 2, discourages using bots: bot's behavior differs too much from actual users. Instead, he recommends recording live play data, and replay it on the server to fix bugs or check the load.

04 June 2012

TERA's free-targetting combat: server-side

In the May 2012 issue of GD mag, Seungmo Koo, the server architect for TERA, wrote about how their studio implemented the server back-end for the game's free-targeting combat system. Some parts of the technical solution described in the article were unclear, or the problems were not obvious, so I filled the gaps by guessing their approach.

Basics

During fights, the player uses various kinds of attacks and skills. For the player, free-targeting means paying attention to the avatar's orientation and position. There is no monster selection like in WoW. Looking at actual gameplay footage, the combat feels more lively, but there is a lot of hit and run.

From a system perspective, free-targeting means the server does not know which enemy the avatar is targeting. Hence, the server has to detect if a skill's volume of effect collides with any of the monsters around the avatar. To ensure quick computation of the collision, the server runs at 60 FPS. These features are similar to traditional multiplayer FPS requirements, except that for security and simplicity, there is no lag compensation running on the client-side. All the computations happen on the server-side. However, for an MMO to reach a gameplay as responsive as an FPS, the client should be able to execute the player's actions immediately, account for lag, and eventually run some of the NPCs. TERA accounts for lag like an MMO: in PvE, NPCs are slow to give players time to react. In PvP, however, players with a lower ping to the server have an advantage.

Game design

Every creature in the game world is made of cylinders. A skill reaches its target when the skill's volume of effect collides with one of the target's cylinders. A skill's volume of effect can be a cylinder (e.g. usual AoE around a target or self), a portion of a cylinder (e.g. rotating attack using an axe), a portion of a cone (e.g. a frontal attack using a spear), and so on.

Since skills may last for some time, they are discretized in two to more than a dozen targeting times, and each targeting time is associated with a particular volume of effect. Each targeting time consists of two phases: a search (read) phase, and an update (write) phase. For instance, using a rotating attack, the avatar spins for 360 degrees in 300ms, inflicting damage during its turn. The first targeting time fires at 50ms: in the search phase, the server searches for a target standing in a particular volume in front of the avatar, and, in the update phase, inflicts that target 200 damage. The second targeting time fires at 100ms: the server searches for a target standing in another particular volume on the left of the avatar, and inflicts that target 100 damage. And so on for the remaining targeting times.

Multi-threading and performance

Task Example
Timer Damage over time or NPC AI movement scripts
Region update Add/remove/update the position of a creature in a region
Creature update Update HP, status, position of a creature

Each TERA server runs the whole world for approximately 6k concurrent players. The server has a few things to do in the main loop. First, it processes client packets, and generates timer, region, and/or creature tasks from them. That is when the search-for-target phase happens. Then, it executes scheduled timer tasks, which will likely themselves generate region and creature tasks. And finally, it executes region tasks. The update phase happens in the creature tasks, and also in the region task in case of creature movement, spawn, or death events.

To come up with their final design, the TERA engineers may have followed this train of thoughts:

  • Problem: an 8-core server running Windows reaches 100% CPU with 3k players. How can this be improved?
  • Solution: use asynchronous IO such as Window's IOCP. It automatically creates a task queue for client packets and a worker thread pool to process that packet queue in parallel.
  • New problem: The threads spend too much time waiting for locks when trying to read or update region data (e.g. list of creatures in the region, or creature position).
  • Solution: Avoid contention by replicating the world in each thread. No lock: the thread searches in its own version of the world.
  • New search-phase problem: When executing a region task, the region data of a particular thread becomes inconsistent with other threads.
  • Solution: The thread pushes the region tasks it generates into a queue shared between all threads, so that each thread processes all the region tasks. As an example, when an 8-core server receives a client movement packet, one thread processes it and generates (for instance) one region task from that packet. The thread pushes this region task to the end of the queue with an execution counter of 8. Each of the 8 threads processes the task queue at its own pace. When a thread processes a task, it decrements the task's counter. When the counter reaches 0, the thread deletes the task from the queue. Hence, the task mentioned above will be removed from the queue when all the threads have processed it.
  • New update-phase problem: frequently updating a particular creature causes lock contention.
  • Solution: The world replica of each thread stores creature pointers. When a thread processes a packet or a timer task that results in a creature update, it pushes the creature's function to run (e.g. Creature.receiveDamage), with particular arguments (e.g. 200) to that creature's task queue. If the creature task queue is currently being iterated over by another thread, then the current thread returns to what it was doing before (e.g. main loop). Otherwise, the current thread iterates over the creature's task queue. The queue starts with the task the thread just pushed, but other threads may push tasks to the queue while the current thread is processing its task. The current thread will process those extra tasks too, until the queue is empty. Then, the thread will go back to its main loop. Hence, creature tasks are executed asynchronously; getting a return value for the task execution requires the active object pattern.

The resulting high-level architecture is something like the following:

Questions

Seugmo replied to my original questions (see the comment below); here are some more!

  • How much CPU is caused by packet processing? How much does the CPU increase when 1000 players gather together versus when players are homogeneously scattered?
  • When lots of creatures gather in the same region, the search phase (happening on only one thread) may consume a lot of that thread's CPU. Could the search phase happen faster if the regions were smaller? Could it happen faster if the regions were using Binary Space Partitioning?

01 March 2012

Netgames 2005

Summary of selected papers from the NetGames 2005 conference.

Traffic Characteristics of a Massively Multi-player Online Role Playing Game, by Kim et al.

  • Port mirroring through 1Gbps hub and tcpdump of 92 hours of a Lineage 2 server in December 2004. TCP packets.
  • Both upstream and downstream increase linearly with number of concurrent users.
  • Play session duration: 3 hours average, 26 min median, 40+ hours 99%
Stream Number of packets Ratio of data packets Payload (avg, median, and 99%) Bandwidth Bandwidth per user
Upstream (clients to server) 6.28 billions 23% (the rest are ACKs, SYNs, or FINs) 19, 20, 50 bytes max 9 Mbps 1.6 kbps
Downstream (server to clients) 6.43 billions 98% 318, 161, 1459 (= MTU) bytes max 140 Mbps 20 kbps

Dynamic Microcell Assignment for Massively Multiplayer Online Gaming, by De Vleeschauwer et al.

  • Divide the world in atomic square microcells. Compute each microcell's load induced by processing player actions (weight=1), forwarding player actions to neighbouring cells (w=0.05 if cells on same machine, w=0.1 if cells on different machines), receiving forwarded actions from neighbouring cells (w=0.2, w=0.4), and moving players to and from neighbouring cells (w=3, w=15). Then, assign cells to servers so that no server has a higher load than another server.
  • Algorithms to assign cells to servers:
    • Greedy: processes cells in descending order of their load, and assigns them to the server currently with the lowest load. Pro: fast. Con: does not take locality into account.
    • Clustering: start with each cell is a cluster. Merge two clusters that have the lowest load until there are as many clusters as servers. Con: in the last few steps, some heavy-load clusters are merged and may be assigned to servers that can't handle them.
    • Simulated annealing: start by randomly assigning cells to servers, then randomly swap or move cells around to find a better solution, and keep iterating to refine the solution. Able to find very good solutions if the initial solution comes from another algorithm.
    • Integer linear programming for the optimal deployment. Pro: optimal. Con: takes days to compute, but a timeout can be specified to get a suboptimal solution. (But then, other algorithms give better results faster).
  • Evaluation: If player hotspots are spread randomly, the maximum server load can be reduced by 30% compared to the baseline with one large cell of constant size per server. But if hotspots are regularly spread, and there's as many hotspots as available servers, then the microcell algorithms are at least 10% worse than the baseline. In general, simulated annealing starting from a greedy solution was the most efficient of the algorithms.

15 February 2012

Netgames 2004

Implementation of a service platform for online games, by Shaikh et al.

  • How to provision a cloud infrastructure hosting games
  • Each bot sleeps periodically to reduce CPU load and instantiate more bots. Bots connect to the game server according to a Poisson process with mean inter-arrival time of 1/λ=1s
  • TIO polls game servers at regular intervals for their CPU load using SNMP. Using the raw CPU leads to occasional over-provisioning and higher costs, but using a moving average to smooth the CPU estimate misses CPU peaks and may result in inefficient QoS for some players.
  • Since player load follows daily and weekly patterns, it's possible to anticipate the peaks; in that case, provision slightly ahead of the peak and keep a smoothed metric.
  • Relevant metrics other than CPU: "slack time" = unused time during an iteration of the server loop

Zoned federation of game servers: a peer-to-peer approach, by Iimura et al.

  • DHT implementation is Pastry. It uses SHA1 to map a game zone to its owner and runner.
  • Experimentation with 296 P3 1Ghz 512MB, all connected to a single 100Mbps switch. 295 machines run from 100 to 1,000 members, and a single machine runs the zone owner. Time taken to update the state of everybody is exponential with the number of zone members (average of 50ms for 500 members, 100ms for 700, 200ms for 1,000).
  • Spreading users uniformly on multiple zones, each with a single zone owner, reduces the load.

Scalable Peer-to-Peer Networked Virtual Environment, by Hu and Liao

  • Scalability implies that nodes can be added or removed on the fly while keeping the whole system functional. Ultimate P2P goal: adding a node should increase overall system resources without consuming centralized resources.
  • Cut the virtual space in Voronoi cells, where each user is at the center of his cell (deterministic algorithm). Each user connects with at least all his Vornoi neighbors (min 3, average 6, max n-1), and with more users if they are in his area of interest. Each time a user joins, moves, or leaves the game, his neighbors have to recompute their Voronoi diagram.

03 February 2012

Scaling League of Legends

Notes from a 2011 Qcon talk about scaling the non-gaming server side of League of Legends. They are not worried about persistence during a game, but rather in the match-making, lobby, or store. They have soft real-time requirements: not the order of 10ms, more the order of a few seconds. [Twitter and Facebook are soft RT too]

Scaling

Scalability: it's easier to constrain what the logic developers are allowed to do, than to define what they are not allowed to do. Examples: Map/Reduce is a whole paradigm (you have to make your logic fit into a map() and a reduce()), NoSQL's unstructured data is a double-edge sword (you lose the ability to join, but queries come back faster), and when you're partitioned, you pick to be either atomic/consistent or available

46:50: Scaling should be dynamic/elastic; you need cluster recomposition and stateless growth patterns. Hence the system should be dynamically configurable. On the fly, you should be able to adjust the thread pool size, add/remove roles to machines, and switch logic or system algorithms.

Food for thought: list all the benefits/tricks a load balancer can provide to your system.

Caching

Caching provides flexibility: failovers, distribution of workload (hot code updates or restarts). Coherence is a distributed cache used between Hibernate and the DAO layer as "cache-through". If the DAO asks Coherence and there's a cache miss, then Coherence asks Hibernate (itself using a Coherence cache, or calling MySQL over the network if cache miss).

24:00: How to be sure that cache and DB are consistent? Do not cache! Query the DB directly, latency of few sec is OK for soft RT.

11:30: Serialize the work, not the data. It's faster to serialize the work to the data, than to unserialize the data, work on it, then serialize the result. Avoid moving the data all over the network towards where the process is. It's also easier to distribute the work to all the DB nodes than to send to/receive from all the nodes.
The objects sent on network between server and DB should be small: if you have to edit only one field, you should not have to send a big object of 1MB.

Logging and Testing

42:30: They log each function call with how long it took; overhead of 1% performance, but huge value if able to graph/plot the logs. Compare the average call duration in the last few minutes to the usual average to detect problems.

17:15: Keep in mind that the code will be used/executed in a data center, not on your laptop. 51:25: They use EC2 for load testing. 1000 threads per node, each thread simulating 1 user. Realistic because not all clients (threads) are same speed in real-life, and EC2 network is not always the best/reliable. It's not the most performant, but it looks like a quick-and-dirty way to load test. One of their scale testing environment has more than 50 machines.

01 December 2011

Netgames 2003

Modeling player session times of online games, by Chang and Feng

  • network traces of a popular CS server for a week in April 2002
  • 16k user sessions recorded
  • 99% of players play less than 2 hours
  • play session follows a Weibull distribution with k = 0.5 and λ = 20 (shape similar to 1/x exp(-x))
  • For play sessions from 10 to 100 minutes, the chance of disconnecting (ie failure rate) remains constant at 2.5%.
  • For play sessions shorter than 10 minutes, 10% chance of disconnecting. Possible reasons: connection problems, kicked out or leave because of server rules (such as friendly fire allowed, but kicked out if you kill your team-mates too often)

A Fair Message Exchange Framework for Distributed Multi-Player Games, by Guo et al.

  • Assumptions: independent clocks with no synchronization mechanism, players react to server updates, updates only consist of creation and/or removal of object(s) (and NOT object position updates)
  • Users have reaction time to act in response to server update messages. Ignore latency induced by network and only compare user reaction times to determine which update to actually run on the world state.
  • the Fair-Ordering Service [...] dynamically enforces a sufficient waiting period on each action message to guarantee the fair processing of all action messages. But practically, the waiting period is bounded to ensure a relative level of interactivity.
  • Proxies are game-agnostic and located near players (ie low latency between a player and her proxy). Proxy receives action message from user, then forwards that action message with a message identification number (to deliver messages in order) and the reaction time to the game server.

Causality and media synchronization control for networked multimedia games: centralized versus distributed, by Ishibashi et al.

  • Causality control preserves the order of events of game data (keyboard inputs). No need for causality in voice or video
  • Media synchronization control = intra-stream (temporal relation between MU such as voice or video packets) + inter-stream (timing among multiple streams) + group (timing among multiple end-points to ensure fairness) synchronization controls
  • Compare C-S to P2P architectures in terms of success of the 4 previously mentioned control schemes. Voice and video don't need to go through the server (they're sent in P2P mode in both scenarios).
  • Adaptive Δ-causality control used on game data in both scenarios: the recipient considers a packet still valid Δ = 50 ms after its generation timestamp. [That means the latency automatically increases by Δ ms for all packets]. Adaptive means that the value of Δ changes based on the network load. Smaller Δ = game more interactive, large Δ = less packets are discarded for being late/misordered. Unfairness appears when terminals have different Δ, hence need group sync control.
  • Piggy-back an MU on the succeeding k=4 MUs to recover from lost UDP packets
  • Experiment: two terminals in both C-S and P2P scenarios [only two?!]. Terminal 1 is connected to an overloaded hub with delay jitter, Terminal 2 is connected to its own hub. Connections are 10 Mbps ethernet. Server connected to T2's hub. Additional delay of 100 ms introduced between the two terminals by a data link simulator between T1's hub and T2's hub. Game MUs = 20 Bytes, sent 10 times per second, while voice MUs = 400 Bytes, sent 20 times per sec, and video MUs = 5kB, sent 20 times per sec [hence most of the load on the network comes from voice and audio, not game data]. Experiment ran for 2 minutes.
  • For heavy loads (8Mbps), C-S is better for causality, but worse for consistency, fairness, and interactivity.

Bandwidth requirement and state consistency in three multiplayer game architectures, by Pellegrino and Dovrolis

  • Compare C-S, P2P and PP-CA (= P2P with central authority/arbiter receiving moves from all players and notifying them when it detects inconsistencies)
  • Tu = Duration of client loop, Lu = size of update messages
  • CS: client upstream = Lu/Tu, client downstream = N.Lu/Tu, server downstream = N.Lu/Tu, server upstream = N(N.Lu)/Tu
  • P2P: client upstream = client downstream = (N-1)Lu/Tu
  • PP-CA: client upstream = N.Lu/Tu, client downstream = (N-1).Lu/Tu + f.N.Lu/Tu with f = ratio of inconsistencies to be corrected, arbiter downstream = N.Lu/Tu, arbiter upstream = f.N(N.Lu)/Tu

Access network delay in networked games, by Jehaes et al.

  • Look at delays introduced from access networks (aka last mile links), not from back-bone. Goal: how to dimension the network to reach minimum delay possible.
  • Network delay can be caused by propagation (mostly only in the case of back-bones though: 5µs/km), serialization (putting all the bits of a packet on the link), packet processing (route and DNS lookups, error correction), and queuing (other packets have to be treated before; differs from packet to packet, hence jitter, defined as 95% percentile RTT - 5% percentile RTT). AND = minimal RTT (packet processing delay) + S (packet size) / Reff (effective link rate) + Tque (total queuing delay in up- and downstream, results in jitter)
  • Experiment: for 5 different values of S, throw 100 pings. Get RTT and jitter (= Tque) from 100 pings. Obtain Reff from taking the inverse of the best-fitting trend-line through the 5 points (S, average RTT). Obtain min RTT from the intercept of the trend-line through the 5 points (S, top-1% RTT).
  • QoS improves RTT by separating game traffic from other traffics

A Zone-based Gaming Architecture for Ad-Hoc Networks, by Riera et al.

  • Assumption: ad-hoc networks are going to multiply, but C-S and P2P architectures are not well-suited for them. Most interesting part of the job is to determine which device can/should be Zone server.
  • Even the nodes that do not play the game assist the other nodes in delivering data
  • Nodes are mobile: they do not always stay within reach of the same other nodes. Discovery of Zone servers is done through the SLP v2. When latency to a zone server gets too high, a client can pick another zone server, which in turn notifies all the other zone servers of its new connection. When a client does not reply for a while, a zone server can drop it.

A service platform for on-line games

  • Middleware as transparent as possible to the game developer. This middleware sits on top of an existing grid infrastructure from IBM called Globus. Globus decides when to spawn a new game server instance based on current resources and demands.
  • Player services are in charge of authentication, account handling, chat rooms, locating games/selecting a server taking into account player preferences (e.g. team or region), and actually playing the game.
  • Publisher services deal with software deployment and updates, billing, monitoring server performance, service level agreement (e.g. no more than 5% of players suffer from more than 100ms delay)
  • System services include resource management and directory services. These services are accessed by the grid provider.
  • Clients submit jobs using a format containing the executable, its arguments, and resource requirements. Jobs can require spawning instances at different grid locations (e.g. different regions).
  • Various services such as resource informations and information providers (CPU, OS, RAM, connectivity, ...) are indexed in an LDAP. Game-specific services (tracking player stats, server load, ...) could also be added on top of existing services.

18 October 2011

Panel on MMOs from Netgames 2011

Panel at NetGames 2011 about the future of MMO research.

Cheating (no control of the client-side) and performance (weakest node performance conditions the other nodes' performance) are issues in p2p architectures. I could p2p when I play with friends, but not competitively. Even with friends, the developer has no control over the network, NAT traversal problem. Potential solutions: match-making on server, then one peer hosts the game. If host disconnects, the game migrates to another peer. Problem: how to partition and replicate the game world between peers?

How to integrate scaling into game mechanics?

Researchers can get data from companies if they show that it is worth the company's time (taken to fetch and anonymize data).

Problems shared across p2p, client-server and cloud architectures: scalability, cheating, latency, concurrency, synchronization, and replication.

Using the cloud is cheap for small user bases, but very expensive for very large user bases.

Casual Connect and GDC Online are industry conferences that also deal with network and system support for MMOs. But they're expensive.

03 July 2011

[Literature] NetGames 2002

Summary of selected papers about MMOG architecture from the NetGames 2002 conference.

Aarhus et al., Generalized Two-Tier Relevance Filtering of Computer Game Update Events.

  • Two-tiered means network communication is limited to a dedicated concentrator layer
  • TCP
  • Consistency within the server layer requires to pass the world state between servers.
  • Clients connect to concentrators based on network topology, not their position in the game world.
  • The concentrators are constructed to be application independent
  • Dead-reckoning schemes
  • Clients connect to a concentrator, not to the server logic. Hence, client connection is never lost if a game server crashes.


Bharambe et al., Mercury: A Scalable Publish-Subscribe System for Internet Games.

  • distributed content-based publish-subscribe system
  • subscription language expressive enough to allow game-specific subscriptions (eg x in [10,20] && y <= [40,50], or team == "MyTeam"). In the subscription {x in [10,20] && y >0}, a hub for x is more efficient at relaying an event than a hub for y.
  • routing mechanism based on a circle of modular software hubs, each hub storing subscriptions.
  • Evaluation: network topology simulator in which nodes are randomly assigned as hubs [unrealistic]
  • Metrics:
    • Number of publications routed by a node. All nodes end up having the same routing load (ie scalability achieved).
    • Number of subscriptions stored by a node. Virtual world is a square, hence the central zone receives more players and subscriptions than peripheral zones.
    • Delay for a publication to reach the interested subscribers. Increases linearly with the number of nodes [not scalable!]

Cronin et al., An Efficient Synchronization Mechanism for Mirrored Game Architectures.

  • Trailing State Synchronization (TSS)
  • Optimistic algorithm: execute commands as they are received and rollback when late messages are received.
  • TSS runs a delayed "trailing" copy of the live game state. Trailing copy is able to re-order messages and execute them in chronological order.
  • It's cheaper to execute a command multiple times than to make snapshots of the game state
  • Preserving random events was hard


Farber, Network game traffic modelling.

  • Traffic modeling from logs of a 36 hour LAN. Matches of 8 to 30 players.
  • Server sends 16kbps to each client
  • Each client sends 1 kbps to the server
  • Both client and server receive 20-25 packets/sec.
  • Packet size varies a lot.
  • Probability density function of client-server and server-client packet size and latency modeled by Extreme Value distribution: F(x) = exp(-exp(-(x-a)/b)). (long-tail behavior)

Fiedler et al., A Communication Architecture for Massive Multiplayer Games.

  • World cut in rectangle or hexagon tiles.
  • Players subscribe to current and tiles adjacent to closest current tile corner.
  • Each tile contains an environment and an interaction channel.
  • Environment channel for static data. Static objects generate bandwidth only when someone interacts with them, not by themselves. A static object does not interact with other static objects. Consumes low bandwidth. Uses TCP.
  • Interaction channel for active objects. Active objects interact with static and active objects.
  • Tiles are managed by one or more servers (n to n relationship). Server only provides constant data such as terrain and objects within the map. Server only cares about the environment channel. Collision detection on clients. Scales with the number of tiles, not the number of clients.
  • Authoritative objects are on the machine that instantiates them. Other clients have duplicates/"proxies". Collision detection and other object-object interactions are calculated on clients. Affected object instances publish their updates on the interaction channel of the object's current tile.



Griwodz et al., State replication for multiplayer games

  • Player-to-player interactions require low latency (= high urgency).
  • In case of congestion, less urgent events may be dropped. Results in a game of lower quality, but still running.
  • Rare player-to-environment actions require high reliability
  • Game designers define when they build the game which actions are urgent and which are relevant.
  • Clients connect to a nearby proxy. Proxies are interconnected.
  • Participants belong to target groups. Each target group can be contacted through a channel.
  • Overlay network of proxies distribute events to target group members.

Henderson, Observations on game server discovery mechanisms.

  • Analyzing network traffic to know how FPS servers work.
  • Game servers register to server directories. Client gets server list from directory using UDP. Same structure as Napster.
  • Method: 3 Half-Life US server directories were queried four times a day for a month.
  • Problem of directories: single points of failure, stale information (70% of servers not active anymore), redundant info (90% of servers register to all directories), and no congestion control

Mauve et al, A Generic Proxy System for Networked Computer Games.

  • Move intelligence and server functionality to the border of the network
  • Some server functionality can be delegated to the proxies.
  • Proxies located close to the players (need ISP support). Therefore, they could execute client code as well (anti-cheat).
  • Proxies can reduce the server load in doing packet processing and filtering.
  • Overlay network enables traffic rerouting around congested areas, network fault detection and node-failure recovery.

06 February 2011

Wisdom of the crowd

The term wisdom of the crowd has been popularized by Surowiecki in his 2004 book of the same name. Crowds, he argues, are better than a single entity at processing information, coordinating (eg optimizing pavement flow) and cooperating (local networks of trust). Four criteria for a crowd to be wise are:

  • diversity of opinion
  • independence
  • decentralization/localization
  • aggregation, ie combining private thoughts into collective decisions

Surowiecki's book deals with our society, economics and sociology. The wisdom of the crowd paradigm can also be applied to various domains of computer science.

Domain Example What it's not about
Machine Learning State-of-the-art classifiers are boosted random forests. Build many decision trees taking random features and a random part of the whole data set (bootstrapping). In the end, aggregate all of them (bagging). It's not about having a complex and smart model, it's about having millions of "slightly better than random" trees combined together, reducing the variance of the model.
Information Retrieval Google uses MapReduce for data processing. Separate your algorithm in small elementary treatments so that it can scale. The more hardware, the faster. It's not about having a smart and complex algorithm that needs to have 64G of RAM, it's about having many second-hand machines doing an elementary job in a pipelined process, reducing the reliability on a particular machine.

In the case of MMOG, the dominant paradigm is a tightly-coupled client-server architecture. And although server hardware scales, it's still rudimentary: the cutting in shards is often done manually and there's rarely a dynamic allocation of ressources to a particular world region (although that would be helpful in WoW in case of gnome warrior demonstrations). The coupling between client and server has a lot of limits. There are ways to prevent cheating in peer-to-peer MMOG architectures, so why is peer-to-peer still considered a joke?

14 November 2010

Scalability definitions and milestones

Definitions

Scalability is a major technical focus for MMOG, but there are actually many different meanings and assumptions hidden behind the word. In general, Scalability is the capability of a software system to be adapted to meet new requirements of size and scope (Taylor et al. p467).

For system and network engineers, scalability can be achieved in two ways: horizontally (load balancing and clusters of multiple computing resources) or vertically (adding more power to a current computing resource). In both cases, if any of the layers of the stack (hardware, database, application and so on) does not scale, then the whole system won't.

For software engineers, a system scales well if its rate of growth is not greater than the corresponding rate of complexity increase (Taylor et al. p468-474). This definition looks a lot like positive scalability in system and network. However, software architecture scalability also deals with adding new components and takes into account the dependencies between components and connector logic.

Milestones

To my mind, there are so many definitions of scalability that it makes more sense to talk about domain-specific metrics meaningful to the end-users and stakeholders. The number of users is a perfect example. So here is a list of situations illustrating different meanings of scalability with different numbers of users.

Configuration System/Example Scalability meaning
Scalability milestones reached so far
2 or more players sharing the same game world state Game development platforms such as Unity have quite hard times providing a transparent interface to sharing game world state. Each object in the world has to explicitly contain a Network View for it to be sent to other players. Being able to send all relevant game objects to other players without the hassle of having to add a networking layer on top of them.
From 40 to 100+ users in the same world Throughout 2009, Intel has worked on scaling OpenSim. They noticed that the number of users (as well as the number of object prims and scripts) were independent of the number of hardware threads available. This meant OpenSim could host no more than 100 clients, even if more resources were added. Scalability was achieved through an impressive cleaning and optimization work from both software and netsys engineering perspectives. Adding a server to the back-end means being able to handle more users (thanks to an inter-shard daemon mechanism of some sort). There is still a problem, though: if everyone goes on the same shard, the shard crashes. Sharding originally comes from the database community. It has been applied to the software architecture of military simulations in 1995: 1000 users can be supported simultaneously. Current MMOG architectures still cut their world in maps (shards) to be able to use this principle.
Inter-server events Cross-realm pick-up groups in WoW.
Opensim's hypergrid foreign user information stored locally and non-persistently on remote servers.
In this case, scalability sounds more like portability and data compatibility (as the number of users involved stays relatively small). Also, an extra layer of logic can be added: in WoW, experienced players can be grouped with novices so that cross-realm groups are more balanced. But ... on which server shard(s) are these instances stored and computed?
Cloud gaming (Gaikai or OnLive) Streaming video to computers that do not have enough graphics power to run the game themselves A pure example of horizontal scalability: more computers in the cloud means more users.
Scalability milestones not reached yet
1000 players seeing and interacting with each others Giant boss battles, live events, voting in Parliament 1) Clients must still be able to compute the graphics and 2) the server has to be able to cope with the sudden and unusual overload (dynamic resource re-allocation? Hybrid architectures mixing P2P and client-server?)
Cloud gaming for MMOG Cloud gaming NOT as a service on top of existing gaming infrastructures, but as a graphics help for weak clients of an MMOG Some MMOGs have high-end graphics. And there is always a server doing various computations. In the end, the overall architecture would approach something like: weak client <-> "middlebox" shared with a few other weak clients for the graphics <-> game logic server. But this naive method consumes a lot of resources and can certainly be optimized. It sounds a little bit like booster boxes.
10.000+ players seeing each other and interacting with some Stadiums, French demonstrations For the server, scalability means 1) being able to distinguish who can interact with who, 2) allocate resources dynamically (as these events are temporary and can be spontaneous) and 3) approximate distant actions and determine which clients share the same content locally so that some computations can be factorized. For the client, graphics might get demanding.

21 May 2010

[Literature] Tolkien: An Event Based Storytelling System

In this short paper from 2009, Satish et al. introduce Tolkien, an event-based storytelling system. This paper has an obvious link to interactive storytelling and fiction, but there could also be interesting applications to non-static or random content-generation in MMOG as well.

The authors define a story as a time-ordered coherent sequence of events. They consider a database filled with events and event-related data (video, audio, images, texts). Storytelling is simply retrieving appropriate events from the database in a particular order, and filter/adapt them to a particular audience. The filtering allows to show personal events such as birthdays to friends and relatives, music-related events to music-lovers or professional events such as conference talks to coworkers. Interactive storytelling approaches are inefficient in filtering large collections of events and most do not adapt to their audience.

[In the rest of the article, it is quite hard to understand in details what happens because the authors change their notations and names regularly. For instance, they use "objects" without having defined what they are. The "preference triple" definition has only two components. And so forth...

Each node of a directed acyclic graph represents an event. Events contain a spatio-temporal description and a semantics Itype. Edges are relations that connect events. Since relations are heterogeneous, they need different labels (found in the vocabulary L).

The storytelling process consists of two phases. First, the author specifies which events can be included in the story. Then the selected events are processed into a story tailored to the preferences of each member of the audience. (see figure below)

Story scripting

The scripting language the author can use to write the story script is actually a pseudo-SQL language. Example:

FIND $events FROM aParticularFile
FOR e IN $events
{
   FIND $museumsAtNY WHERE (activity = 'museum' 
                            AND datetime = '24 December 2008')
}
TELL $museumsAtNY

Compiling based on audience's preferences

Each member of the audience has its own preferences. For each member of the audience, these preferences are computed into a preference list, which is a a set of (pref_name, score). The score is a float between -1 (dislike) and 1 (like). For event ei and person pj, event interestingness is computed from the attributes of ei and the preference list of pj. It reflects how much an event should be incorporated into a specific viewer's story.

Story interestingness defines how interesting a story will be for a particular viewer. It is updated each time an event is added to the story. The addition of an event to the story aims at keeping story interestingness as high as possible. It can happen that the author's script requires to add an event reducing the story interestingness.

Architecture

Concretely, the viewer's preferences could be retrieved from a Google Calendar, Twitter or Facebook page (hence the WWW on the architecture diagram). The description of the rest of the architecture is given by the authors quite consicely:

The script contains a specification of the story as well as instructions on how to modify it depending of the audience. The script is first analyzed by a Script Processor to check for lexical errors. This provides the Compiler an error-free script with which it creates an operator tree. This operator tree would be stored in a cache. Once the preference list of the audience is known to the Run Time Processor, this tree is converted to a series of Index lookups and queries to the Eventbase. This database contains detailed event descriptions with the relevant media items. The results of these queries are then collected and sent to the Web-UI.
-- Arjun et al.

27 March 2010

Vivox in SL: client, server and protocols

Server-side

As I wrote before and based on Vivox' white paper, the main point is the Server-side mixing all voices in real time and delivering the audio in a single stream. I could not find much more information about the SL-specific Vivox server system (Linden Lab will not reveal their server-side architecture that easily), but I guess the Vivox server-side does not differ a lot between MMOG/VW. unused but required parameters can be found in the SLVoice documentation. This suggests either Vivox cared for a retrocompatibility with the SLVoice Application 1.0 or Vivox did not tailor their client API to SL needs. I am for the later, even though I could not find any documentation for the SLVoice Application 1.0 infirming or confirming that.

Joe Miller explained how the Vivox server sends the audio stream to users and how the system can scale:

According to Miller, the VoIP product is unique because of the ability to project the sound in three dimensional space, as a function of distance and direction from one avatar to another. It takes a 32khz signal at 32kbps from clients, sends it to an Intel based audio server where the input signals are mixed and properly positioned, acoustically, in three dimensions, and a stereo stream is sent back to the client at 64kbps. Even with 100 people speaking at once, the bandwidth requirements are the same for each individual because the servers (dual quad-core Xeons) mix the voices together into a single data stream.
The codec used is Siren 14/G.722.1 Annex C, developed by Polycom but now an international standard. It was chosen because it uses relatively low bandwidth but can carry a wide and dynamic range of audio – not just human voices – making it an ideal codec to broadcast, say, a musical event.

The range at which other resident can hear each other are explained in the SL wiki article "How far does my voice carry". Similarly to text-chat, the server computes the distances between people to determine who hears who, and sends appropriate messages after this computation. Hence (and hopefully) it's impossible to use a modified Second Life Viewer to remove the hearing range limits.

The OpenSim server architecture might not differ a lot from the SL one regarding to voice support. However, I could not find it reading the OpenSim wiki.

Client-side

On the client-side, Linden Lab have chosen to keep the voice features outside of the Viewer: The Second Life Viewer handles configuration, control, and display functions, but the voice streams (from the microphone and from the Vivox voice server) do not enter the Viewer. In other words, These [voice] technologies are contained in external daemon software that is started and stopped by the Second Life client.

The requestId can/should be a GUID so that each response matches a unique request. Each gateway response also contains the request it received. This enables the XML-based protocol to be stateless. TCP provides a reliable transmission that prevents packet loss (important to update the UI reliably and in a timely manner).

voipforvw is a GPL alternative for SLVoice on OpenSim. One of its developer wrote it is a snap-in replacement for this executable [SLVoice.exe] that communicates with the viewer and as you’d guess, does the heavy lifting and coding/decoding. But the project started in February 2008 and has not received any commit since May 2009.

More about the client components in an incoming article ...

Voice protocols (in a nutshell)

The following protocols or techniques are used by some components in the SL client.

SIP is an application-layer protocol and incorporates many elements of HTTP such as headers, encoding rules and status codes. As indicated by its name, SIP is only used to initiate communications between clients. Clients start communicating in peer-to-peer after they have been paired by a SIP server. The SL Viewer uses ports 5060 (non-encrypted) and 5062 (TLS-encrypted) for SIP with UDP. Once clients are paired, they can start exchanging data.

ICE is not a protocol but rather an initialization technique that facilitates peer-to-peer communications in reducing the NAT-traversal delay. It uses a STUN client-server strategy to pair agents. When paired, agents do not rely on the server anymore.

RTP is an application-layer protocol that defines a packet format for delivering audio and video. RTP Use Scenarios in the RFC contain multicast, Mixers and Translators. The use of UDP for the transport layer is obvious in this real-time "send-at-most-once" media-streaming context. The SL Viewer uses the 12000 to 15000 (or 13000?) port range for RTP.

25 March 2010

Vivox integration

This post somehow follows the one about communication channels of February 2010.

Dana Massey explains that voice-chat features can help consolidate current player bounds and introduce newcomers. integrated voice chat has enhanced their experience in a range of ways. It makes playing the game easier and more fun, it strengthens the bonds of community that really keep people in a game over the long term, and helps them ease new players in their worlds. These are the cornerstones of every online game. It’s time for developers to break down the barriers to entry and make it easy for people to make real connections in their online worlds. The Vivox system seems to be a technically interesting voice-chat feature that boasts answering these issues.
Disclaimer: I have never worked at Vivox and I have neither been bribed nor received pressure to talk about their system.

Overview

In a nutshell, Vivox relies on VoIP and increasingly higher-bandwidth telecommunication systems. The whole list of their features can be found in page 2 of their Tear Sheet. They provide a Vivox SDK that lets developers integrate one-on-one and group voice chat, the ability to see when your friends are online and in game, and links to other IM networks as well as 3D positional audio, out-of-game connections and voice fonts into the game. (Although I do not know if what Vivox provides is a true SDK or simply a complete API, I will keep writing SDK.) This bunch of features is supposed to bring interest among developers, particularly if they allow cross-platform communications: Vivox is, after all, a telephony company. The social real-life impacts are enormous: you can stay part of the raid planning even when you can't run the game itself. In other words, players are more and more hooked up to the game.

Integration process

Vivox provides a phased integration of their system into existing MMOG: three major steps are followed one after another.

  1. Anonymous website-based voice channels (do not require user authentication): no more hassle for the user with Ventrilo or Teamspeak The solution is fully managed and supported by Vivox - you don't have to run your own communication server anymore.
  2. Authenticated website-based voice channels enabling game logics and roles (groups, guilds, etc.): Mapping game roles to channel rights by integrating with the game, we can match the communication hierarchy to the command hierarchy, so that raid leaders or fleet commanders are automatically mapped to the communications infrastructure.
  3. In-Game voice features. Players can begin a conversation on the web and transition to the game, and vice versa. In EVE Online for instance, Vivox is integrated inside the game features: Richardson’s team leveraged Vivox’s flexible tools to tightly integrate voice chat into EVE Online’s existing controls, social structure, and game situations, considerably enhancing ease of use and immersion relative to the third party applications many players were using.

Their white paper from January 2009, explains how Vivox partners with the MMOG developers.

  1. Kick-off meeting leads to specifications and introduces the SDK to the MMOG developers
  2. UI design. In their top-down approach, the earlier it is fixed, the shorter the development cycle.
  3. Wiring calls to UI stubs consists of connecting the Vivox calls to UI elements
  4. Testing is based on test-cases

Extra calls

According to their white paper, Vivox can create peering relationships between game titles, or allow publishers with multiple game titles to open up voice communication among the players of multiple games. Vivox can also provide links between your virtual domain and the “real” world through connections to mobile and fixed phones. I do not know much about telephony, so I only show two high-level sequence diagrams of what I think happens when a group is created. The first diagram is in a traditional Vivox-less game.

The second diagram below shows the case when a Vivox in-game voice channel is created. Feel free to tell me if I'm wrong anywhere.

On top of these extra calls, a lot of work remains in wiring calls to UI stubs. I could not find exactly how much overload for the client and for the game server this meant.

A marketing and technological success

From a marketing perspective, Vivox is a double success.
First, Vivox is a success for itself: although many players may keep using their previous voice-chat softwares like Ventrilo or Teamspeak, many players are truly going to enjoy talking inside the game without alt-tabbing. I think MMOG newcomers are very likely to adopt in-game voice-chat rather than external softwares. With more and more performant outsourced in-game voice-chat systems like Vivox, I think external softwares will slowly disappear in a close future. This leaves Vivox with no competitor.
Second, MMOG companies can earn money in RMT with voice-fonts, voice mail and out-of-game messaging, but also in audio advertising.

Vivox is also interesting from a technical perspective. The system can be plugged on an existing traditional client-server game architecture without affecting (too much) the network or the game quality. A very adaptive solution. Based on their white paper, I think the Vivox engineers have been very smart in deciding to:

  • mix all group communications on their servers and send them as a single voice stream to all voice channel participants
  • launch peer-to-peer connections in the case of in-game Person-to-Person Communications

As a result, it is apparently possible to handle as many as 6,000 people in a single voice chat session. Although this impressive number relies on a traditional cluster of servers behind a load-balancer, I do not think there are many smarter choices.



I used sdedit for the sequence diagrams

02 January 2010

[Literature] Network Infrastructure for Massively Distributed Games

by Daniel Bauer et al. 2002

Motivation

Client-Server (CS) Architecture: According to the authors, an Apache server on a usual computer using a Linux OS connected to a Gigabit Ethernet can handle roughly 2000 256-byte HTTP transactions per second. Server farms are said to handle a number of connections on the same order of magnitude. The authors also mention that whereas network bandwidth is abundant, the bottlenecks of such systems are CPU cycles, memory bandwidth, and server I/O. They conclude that server farms handling loads greater than 105 HTTP requests per second currently [in 2002] are infeasible with existing technology and, by extension, so are million-person games requiring as little as a single event per minute. They also take Everquest as an example: Although Everquest claims that tens of thousands of people can participate simultaneously, [...] Everquest is better thought of as 40 independent instances of the same game, each of which handles about 2000 playersEven if this paper was published in 2002, I think what they wrote is still accurate.

Peer-to-Peer (P2P) Architecture: Scalability is limited by the computational power of the weakest peer, because each peer needs to run the complete simulation. Therefore, the authors suggest a hybrid approach to overcome the scalability problems.

Proposed network-level solution: booster boxes

The authors start with the assumption that not all the information the clients send to the server is relevant. The server often sends the same packets to different clients (for instance, in a PVP zone, the death of a player will be communicated to all the players who currently see the dead player). To solve this problem, booster boxes can be set up between ISP Access Routers and Edge Routers. They have 4 roles:

  • Caching non-real-time information
  • Aggregation of redundant events
  • Filtering of no-longer relevant events
  • Routing packets to the appropriate server(s)

Booster box architecture

Data layer: for the bulk of traffic, booster boxes behave like ordinary level-2 forwarding devices, e.g. like ethernet switches. The data layer must be able to handle packet forwarding at speeds equivalent to the port of a residential access router, i.e. in the range of 155Mbits/s to 1Gbits/s and still has to process, copy or divert packets to the booster layer. The authors explain that a pure hardware solution is not flexible enough and a pure software solution is not able to handle such line speeds, that is why they introduce Network Processors (NP). A NP is a general purpose processor with access to many network-specific co-processors performing tasks such as checksum generation, table look up and header comparison. They suggest writing the network-forwarding code in C and load this code in the processor.

Booster layer: application-level layer. A Booster Library contains the API through which the boosters can call the data-layer operations (ie copy, divert, pass through, forward, ...). Boosters' code can be executed either on a general-purpose CPU or on the NP (or on a combination of the 2). Boosters can perform tasks independently or be coordinated by the Booster Control Point to work together. Another role of the Booster Control Point is to establish a Quality of Service overlay network between booster boxes. This Booser Overlay Network (BON) can circumvent several network problems that I have not fully understood yet. BON uses its own addressing schemes and packets eventually exchanged between booster boxes and servers contain a BON header (a BON packet is encapsulated in an IP packet).

Examples

Large Interactive Game Show: a TV program asks a question, spectators send their answer to the TV server (on the Internet). The authors pretend their system reduces the server's load exponentially.

Large Virtual World: The dotted arrows in the figure nearby shows how information comes from a player to the server. A white player sends an event to the closest booster box (in the network cloud). The correct server receives the event notification from the booster box. The solid arrows show how the server answers to this information: all the booster boxes to which white players are connected receive the update and forward it to their own white players.
Remark: in the case of a server-wide event such as a GM broadcast, the white game server will have somehow to tell the black game server to forward the information.
Interestingly, the authors suggest creating a game transport protocol in which the virtual location is encoded independently of the particular game, bringing the advantage that a game booster could to a large extent be independent of the game. To my mind, this idea is nice for ISP because they will rent their booster boxes to MMOG companies, but MMOG companies will have to follow booster box norms. I do not think there is currently any consortium able to establish protocol norms on MMOG...



See also: Understanding Network Processors by Shah.